DocumentCode :
1987897
Title :
On gene prediction by cross-species comparative sequence analysis
Author :
Chen, Rong ; Ali, Hesham
Author_Institution :
Dept. of Comput. Sci., Nebraska Univ., Omaha, NE, USA
fYear :
2003
fDate :
11-14 Aug. 2003
Firstpage :
446
Lastpage :
447
Abstract :
Sequencing of large fragments of genomic DNA makes it possible to perform comparisons of genomic sequences for identification of protein-coding regions. We have conducted a comparative analysis of homologous genomic sequences of organisms with different evolutionary distances and determined the degree of conservation of the noncoding regions between closely related organisms. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. Based on this finding and training of data sets, we proposed a model by which coding sequences could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.
Keywords :
DNA; cellular biophysics; evolutionary computation; genetics; molecular biophysics; physiological models; proteins; cross-species comparative sequence analysis; degree of conservation; evolutionary distances; exon structure; gene prediction; genomic DNA; homologous genomic sequences; intron; noncoding regions; protein-coding region identification; species-dependent statistical models; Bioinformatics; DNA; Genomics; Humans; Mice; Organisms; Proteins; Sensitivity and specificity; Sequences; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
Print_ISBN :
0-7695-2000-6
Type :
conf
DOI :
10.1109/CSB.2003.1227366
Filename :
1227366
Link To Document :
بازگشت