DocumentCode
772591
Title
Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction
Author
Akhtar, Mahmood ; Epps, Julien ; Ambikairajah, Eliathamby
Author_Institution
Nat. Univ. of Sci. & Technol., Rawalpindi
Volume
2
Issue
3
fYear
2008
fDate
6/1/2008 12:00:00 AM
Firstpage
310
Lastpage
321
Abstract
Genomic sequence processing has been an active area of research for the past two decades and has increasingly attracted the attention of digital signal processing researchers in recent years. A challenging open problem in deoxyribonucleic acid (DNA) sequence analysis is maximizing the prediction accuracy of eukaryotic gene locations and thereby protein coding regions. In this paper, DNA symbolic-to-numeric representations are presented and compared with existing techniques in terms of relative accuracy for the gene and exon prediction problem. Novel signal processing-based gene and exon prediction methods are then evaluated together with existing approaches at a nucleotide level using the Burset/Guigo1996, HMR195, and GENSCAN standard genomic datasets. A new technique for the recognition of acceptor splice sites is then proposed, which combines signal processing-based gene and exon prediction methods with an existing data-driven statistical method. By comparison with the acceptor splice site detection method used in the gene-finding program GENSCAN, the proposed DSP-statistical hybrid technique reveals a consistent reduction in false positives at different levels of sensitivity, averaging a 43% reduction when evaluated on the GENSCAN test set.
Keywords
biology computing; cellular biophysics; genetics; molecular biophysics; proteins; signal processing; statistical analysis; DNA symbolic-to-numeric representations; DSP-statistical hybrid technique; GENSCAN; acceptor splice sites; data-driven statistical method; deoxyribonucleic acid; eukaryotic gene prediction; exon prediction; gene prediction; gene-finding program; genomic sequence processing; nucleotide level; protein coding; sequence analysis; signal processing; standard genomic datasets; Accuracy; Bioinformatics; DNA; Digital signal processing; Genomics; Prediction methods; Proteins; Sequences; Signal analysis; Signal processing; Autoregressive processes; Gaussian mixture models; correlation; deoxyribonucleic acid (DNA); discrete Fourier transforms (DFTs); discrete cosine transforms (DCTs);
fLanguage
English
Journal_Title
Selected Topics in Signal Processing, IEEE Journal of
Publisher
ieee
ISSN
1932-4553
Type
jour
DOI
10.1109/JSTSP.2008.923854
Filename
4550545
Link To Document