Gene Number Comparison
Creating of a gene number cross reference between versions is complicated any or all of the following reasons:
- Two genes become one - As contigs assemble, a partial gene at the end of one contig can assemble with a partial gene at the end of a different contig.
- Two genes become one - As the sequence is polished, changes to a stop, insertions, and deletions can cause two genes to be in frame and now form one gene.
- One gene becomes two - As the sequence is polished, insertions, deletions, and new stops can cause a gene to be split between two frames and now be two gene calls.
- A gene disappears - As the sequence is polished, a frame change or stop occurs so close to the start of a gene that it doesn't get predicted in the new assembly.
- A gene gets added - A new sequence may have new opportunities for genes which were not present in the previous build.
A tab-delimited comparison file is available to aid in the development of a cross reference. It is to be used as a starting point and not viewed as a final list. It was generated using a BLAST analysis from old to new and from new to old and has not been reviewed by human eyes. The list has not been sorted.
The columns are:
- Gene from the new assembly
- Length of gene from new assembly
- Gene from the old assembly
- Length of gene from old assembly
- Type of hit -
- When a gene from the old assembly and a gene from the new assembly list each other as their top hit, they are listed as "Two-way best hit".
- If the best hit appears to be one way, it will show the best hit in that one direction. For example the Draft may have a best blast hit in the Finished but not the other way around. Often this occurs when there are two or more similar genes. For example:
| Finished Gene | Length | Draft Gene | Length | Type Hit | Length Comparison | Percent Identity |
| Asuc1441 | 356 | Draft0910 | 334 | Finished hit Draft | longer | 38 |
| Asuc1586 | 334 | Draft0910 | 334 | Two-way best hit | Same | 100 |
In this case gene Asuc1441 has a best Blast hit in the draft of Draft0910 which isn't reciprocated. This is because Draft0910 was a much better match to Asuc1586. The two finished genes Asuc1441 and Asuc1586 are probably similar but only one of them was represented in the draft version.
- If there was no blast hit above the cutoff, it will say "no hit"
- Comparison of the two gene lengths (e.g., may be different)
- Percent Identity over the length of the alignment
SiteMap
Feedback
Life Sciences Division
ORNL
Disclaimer
Webmaster