DocumentCode :
419339
Title :
Deriving a novel codon index by combining period-3 and fractal features of DNA sequences
Author :
Qi, Yan ; Gao, Jianbo ; Cao, Yinhe ; Tung, Wen-wen
Author_Institution :
Dept. of Biomed. Eng., Johns Hopkins Univ., Baltimore, MD, USA
fYear :
2004
fDate :
16-19 Aug. 2004
Firstpage :
531
Lastpage :
532
Abstract :
Summary form only given. When a gene finding algorithm incorporates multiple useful and non-redundant sources of information about coding regions, it becomes more successful. It is thus highly desirable to find new and efficient codon indices. Here we propose a novel codon index, which we call the period-3 fractal deviation (PFD). This is obtained by simultaneously considering two incompatible features of DNA sequences, the period-3 feature in coding regions and the fractal feature in both coding and non-coding regions. These two features are incompatible because period-3 defines a specific scale of three nucleotide bases while fractal means there are not any specific scales. The PFD is very different for coding and non-coding sequences, and is reading-frame-dependent. The accuracy of the PFD is evaluated by studying all of the 16 yeast chromosomes. It is found that the percentage accuracy is very high and quite independent of the sliding window size. It is also found that this percentage accuracy is much higher than when period-3 and fractal features are characterized alone, especially when the window size is small. This highly suggests that the method is not only useful for the study of long genome sequences, but may also be very powerful for the study of short DNA segments. The PFD is complementary to other codon indices, including Fourier measures of period-3. This makes it possible to integrate PFD with other measures. Indeed, integration of the PFD measure with those indices using the Fisher linear discriminant analysis significantly improves the accuracy of protein coding sequence identification; This implies the measure proposed here may be readily incorporated into existing gene finding algorithms. Other salient features of the method is that it is non-parametric, does not require training, and can be fully automated.
Keywords :
DNA; biology computing; cellular biophysics; fractals; genetics; molecular biophysics; proteins; DNA sequences; Fisher linear discriminant analysis; Fourier measures; amino acid; coding sequences; codon index; fractal features; long genome sequences; noncoding sequences; nucleotide bases; period-3 fractal deviation; protein coding sequence identification; yeast chromosomes; Bioinformatics; Biological cells; DNA; Fractals; Fungi; Genomics; Information resources; Linear discriminant analysis; Phase frequency detector; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
Type :
conf
DOI :
10.1109/CSB.2004.1332486
Filename :
1332486
Link To Document :
بازگشت