DocumentCode
3408317
Title
Recurrence time statistics: versatile tools for genomic DNA sequence analysis
Author
Cao, Yinhe ; Tung, Wen-wen ; Gao, J.B.
Author_Institution
Biosieve, Campbell, CA, USA
fYear
2004
fDate
16-19 Aug. 2004
Firstpage
40
Lastpage
51
Abstract
With the completion of the human and a few model organisms´ genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or noncoding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Keywords
DNA; biology computing; genetics; molecular biophysics; C. elegans; E. coli; Homo sapiens; S. cervisivae; codon index; computational tools; expressed sequence tags; gene finding algorithms; genomic DNA sequence analysis; nematode worm; protein coding regions; recurrence time statistics; repeat-related features; yeast; Bioinformatics; DNA computing; Feature extraction; Genomics; Humans; Organisms; Proteins; Sequences; Statistical analysis; Statistics;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN
0-7695-2194-0
Type
conf
DOI
10.1109/CSB.2004.1332415
Filename
1332415
Link To Document