Title :
Protein coding region prediction based on the adaptive representation method
Author :
Marhon, Sajid A. ; Kremer, S.C.
Author_Institution :
Sch. of Comput. Sci., Univ. of Guelph, Guelph, ON, Canada
Abstract :
This article proposes a new protein-coding-region prediction technique. The technique maps DNA sequences to numerical strings using an adaptive representation scheme and then uses signal processing to identify coding regions. We learn a mapping from symbols to numerical sequences by computing the distribution variance of each nucleotide in a DNA sequence, and then use the period-3 spectrum to distinguish coding and non-coding regions. Compared to other spectral methods, our method boosts the period-3 spectrum peaks in putative protein-coding regions and attenuates the extraneous peaks in putative non-coding regions by learning to weight the signal by the C-G to A-T ratios. Our adaptive representation method outperforms all other state-of-the-art spectral methods on every benchmark dataset available according to 3 different performance measures.
Keywords :
bioinformatics; proteins; signal processing; A-T ratios; C-G T ratios; DNA sequences; adaptive representation method; numerical strings; period-3 spectrum; protein coding region prediction technique; signal processing; Bioinformatics; DNA; Digital signal processing; Encoding; Genomics; Proteins; Tin; 3-Base periodicity; DNA computing; DNA spectral analysis; Gene finding;
Conference_Titel :
Electrical and Computer Engineering (CCECE), 2011 24th Canadian Conference on
Conference_Location :
Niagara Falls, ON
Print_ISBN :
978-1-4244-9788-1
Electronic_ISBN :
0840-7789
DOI :
10.1109/CCECE.2011.6030484