• DocumentCode
    3408317
  • Title

    Recurrence time statistics: versatile tools for genomic DNA sequence analysis

  • Author

    Cao, Yinhe ; Tung, Wen-wen ; Gao, J.B.

  • Author_Institution
    Biosieve, Campbell, CA, USA
  • fYear
    2004
  • fDate
    16-19 Aug. 2004
  • Firstpage
    40
  • Lastpage
    51
  • Abstract
    With the completion of the human and a few model organisms´ genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or noncoding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
  • Keywords
    DNA; biology computing; genetics; molecular biophysics; C. elegans; E. coli; Homo sapiens; S. cervisivae; codon index; computational tools; expressed sequence tags; gene finding algorithms; genomic DNA sequence analysis; nematode worm; protein coding regions; recurrence time statistics; repeat-related features; yeast; Bioinformatics; DNA computing; Feature extraction; Genomics; Humans; Organisms; Proteins; Sequences; Statistical analysis; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
  • Print_ISBN
    0-7695-2194-0
  • Type

    conf

  • DOI
    10.1109/CSB.2004.1332415
  • Filename
    1332415