• DocumentCode
    772591
  • Title

    Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction

  • Author

    Akhtar, Mahmood ; Epps, Julien ; Ambikairajah, Eliathamby

  • Author_Institution
    Nat. Univ. of Sci. & Technol., Rawalpindi
  • Volume
    2
  • Issue
    3
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    310
  • Lastpage
    321
  • Abstract
    Genomic sequence processing has been an active area of research for the past two decades and has increasingly attracted the attention of digital signal processing researchers in recent years. A challenging open problem in deoxyribonucleic acid (DNA) sequence analysis is maximizing the prediction accuracy of eukaryotic gene locations and thereby protein coding regions. In this paper, DNA symbolic-to-numeric representations are presented and compared with existing techniques in terms of relative accuracy for the gene and exon prediction problem. Novel signal processing-based gene and exon prediction methods are then evaluated together with existing approaches at a nucleotide level using the Burset/Guigo1996, HMR195, and GENSCAN standard genomic datasets. A new technique for the recognition of acceptor splice sites is then proposed, which combines signal processing-based gene and exon prediction methods with an existing data-driven statistical method. By comparison with the acceptor splice site detection method used in the gene-finding program GENSCAN, the proposed DSP-statistical hybrid technique reveals a consistent reduction in false positives at different levels of sensitivity, averaging a 43% reduction when evaluated on the GENSCAN test set.
  • Keywords
    biology computing; cellular biophysics; genetics; molecular biophysics; proteins; signal processing; statistical analysis; DNA symbolic-to-numeric representations; DSP-statistical hybrid technique; GENSCAN; acceptor splice sites; data-driven statistical method; deoxyribonucleic acid; eukaryotic gene prediction; exon prediction; gene prediction; gene-finding program; genomic sequence processing; nucleotide level; protein coding; sequence analysis; signal processing; standard genomic datasets; Accuracy; Bioinformatics; DNA; Digital signal processing; Genomics; Prediction methods; Proteins; Sequences; Signal analysis; Signal processing; Autoregressive processes; Gaussian mixture models; correlation; deoxyribonucleic acid (DNA); discrete Fourier transforms (DFTs); discrete cosine transforms (DCTs);
  • fLanguage
    English
  • Journal_Title
    Selected Topics in Signal Processing, IEEE Journal of
  • Publisher
    ieee
  • ISSN
    1932-4553
  • Type

    jour

  • DOI
    10.1109/JSTSP.2008.923854
  • Filename
    4550545