• DocumentCode
    3379307
  • Title

    Application of hidden Markov models to gene prediction in DNA

  • Author

    Yin, Michael M. ; Wang, Jason T L

  • Author_Institution
    Dept. of Comput. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    40
  • Lastpage
    47
  • Abstract
    Programs currently available for gene prediction from within genomic DNA are far from being powerful enough to elucidate the gene structure completely. We develop a hidden Markov model (HMM) to represent the degeneracy features of splicing junction donor sites in eucaryotic genes. The HMM system is fully trained using an expectation maximization algorithm and the system performance is evaluated using the 10-way cross-validation method. Experimental results show that our HMM system can correctly classify more than 95% of the candidate sequences into the right categories. More than 91% of the true donor sites and 97% of the false donor sites in the test data are classified correctly. These results are very promising, considering that only the local information in DNA is used. This model will be a very important component of effective and accurate gene structure detection system currently being developed in our lab
  • Keywords
    DNA; biology computing; hidden Markov models; optimisation; 10-way cross-validation method; HMM system; candidate sequences; degeneracy features; eucaryotic genes; expectation maximization algorithm; false donor sites; gene prediction; gene structure detection system; genomic DNA; hidden Markov models; junction donor sites; local information; system performance; true donor sites; Bioinformatics; DNA computing; Genomics; Hidden Markov models; Machinery; Sequences; Signal processing; Splicing; System performance; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Intelligence and Systems, 1999. Proceedings. 1999 International Conference on
  • Conference_Location
    Bethesda, MD
  • Print_ISBN
    0-7695-0446-9
  • Type

    conf

  • DOI
    10.1109/ICIIS.1999.810222
  • Filename
    810222