Title :
Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction
Author :
Akhtar, Mahmood ; Epps, Julien ; Ambikairajah, Eliathamby
Author_Institution :
Nat. Univ. of Sci. & Technol., Rawalpindi
fDate :
6/1/2008 12:00:00 AM
Abstract :
Genomic sequence processing has been an active area of research for the past two decades and has increasingly attracted the attention of digital signal processing researchers in recent years. A challenging open problem in deoxyribonucleic acid (DNA) sequence analysis is maximizing the prediction accuracy of eukaryotic gene locations and thereby protein coding regions. In this paper, DNA symbolic-to-numeric representations are presented and compared with existing techniques in terms of relative accuracy for the gene and exon prediction problem. Novel signal processing-based gene and exon prediction methods are then evaluated together with existing approaches at a nucleotide level using the Burset/Guigo1996, HMR195, and GENSCAN standard genomic datasets. A new technique for the recognition of acceptor splice sites is then proposed, which combines signal processing-based gene and exon prediction methods with an existing data-driven statistical method. By comparison with the acceptor splice site detection method used in the gene-finding program GENSCAN, the proposed DSP-statistical hybrid technique reveals a consistent reduction in false positives at different levels of sensitivity, averaging a 43% reduction when evaluated on the GENSCAN test set.
Keywords :
biology computing; cellular biophysics; genetics; molecular biophysics; proteins; signal processing; statistical analysis; DNA symbolic-to-numeric representations; DSP-statistical hybrid technique; GENSCAN; acceptor splice sites; data-driven statistical method; deoxyribonucleic acid; eukaryotic gene prediction; exon prediction; gene prediction; gene-finding program; genomic sequence processing; nucleotide level; protein coding; sequence analysis; signal processing; standard genomic datasets; Accuracy; Bioinformatics; DNA; Digital signal processing; Genomics; Prediction methods; Proteins; Sequences; Signal analysis; Signal processing; Autoregressive processes; Gaussian mixture models; correlation; deoxyribonucleic acid (DNA); discrete Fourier transforms (DFTs); discrete cosine transforms (DCTs);
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
DOI :
10.1109/JSTSP.2008.923854