• DocumentCode
    1900163
  • Title

    On DNA Numerical Representations for Period-3 Based Exon Prediction

  • Author

    Akhtar, Mahmood ; Epps, Julien ; Ambikairajah, Eliathamby

  • Author_Institution
    Univ. of New South Wales, Sydney
  • fYear
    2007
  • fDate
    10-12 June 2007
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Processing of DNA sequences using traditional digital signal processing methods requires their conversion from a character string into numerical sequences as a first step. Many representations introduced previously assign values to indicate the four DNA nucleotides A, C, G, and T that impose mathematical structures not present in the actual DNA sequence. In this paper, almost all existing methods are compared for the purpose of identifying protein coding regions, using the discrete Fourier transform (DFT) based spectral content measure to exploit period-3 behaviour in the exonic regions for the GENSCAN test set. False positive vs. sensitivity, receiver operating characteristic (ROC) curve and exonic nucleotides detected as false positive results all show that the two newly proposed numerical of DNA representations perform better than the well-known Z-curve, tetrahedron, and Voss representations, with 66-75% less processing. By comparison with Voss representation, the proposed paired numeric method can produce relative improvements of up to 12% in terms of prediction accuracy of exonic nucleotides at a 10% false positive rate using the GENSCAN test set.
  • Keywords
    DNA; Fourier transforms; biology computing; cellular biophysics; molecular biophysics; molecular configurations; proteins; sensitivity analysis; signal processing; DNA nucleotides; DNA numerical representations; DNA sequences; GENSCAN test set; Voss representation; digital signal processing; discrete Fourier transform; exonic nucleotides; period-3 based exon prediction; protein coding regions; receiver operating characteristic curves; Accuracy; DNA; Digital signal processing; Discrete Fourier transforms; Proteins; Sensitivity; Sequences; Signal mapping; Signal processing; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Genomic Signal Processing and Statistics, 2007. GENSIPS 2007. IEEE International Workshop on
  • Conference_Location
    Tuusula
  • Print_ISBN
    978-1-4244-0998-3
  • Electronic_ISBN
    978-1-4244-0999-0
  • Type

    conf

  • DOI
    10.1109/GENSIPS.2007.4365821
  • Filename
    4365821