• DocumentCode
    478688
  • Title

    Analysis of binary feature mapping rules for promoter recognition in imbalanced DNA sequence datasets using Support Vector Machine

  • Author

    Damasevicius, R.

  • Author_Institution
    Software Eng. Dept., Kaunas Univ. of Technol., Kaunas
  • Volume
    2
  • fYear
    2008
  • fDate
    6-8 Sept. 2008
  • Firstpage
    42694
  • Lastpage
    42699
  • Abstract
    Recognition of specific functionally-important DNA sequence fragments is considered one of the most important problems in bioinformatics. One type of such fragments are promoters, i.e., short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. In this paper, a machine learning method, called support vector machine (SVM), is used for classification of DNA sequences and promoter recognition. For optimal classification, 11 rules for mapping of DNA sequences into binary SVM feature space are analyzed. Classification is performed using a power series kernel function. Kernel parameters are optimized using a modification of the Nelder-Mead (downhill simplex) optimization method. The results of classification for drosophila and human sequence datasets are presented.
  • Keywords
    DNA; bioinformatics; data mining; feature extraction; genetics; learning (artificial intelligence); molecular biophysics; optimisation; pattern classification; sequences; support vector machines; Nelder-Mead optimization method; binary SVM feature mapping rule analysis; bioinformatics; biomolecular data mining; downhill simplex method; gene prediction; kernel parameter optimization; machine learning method; power series kernel function; promoter recognition; short regulatory DNA sequence dataset classification; support vector machine; Bioinformatics; Biological cells; DNA; Encoding; Kernel; Learning systems; Sequences; Support vector machine classification; Support vector machines; Training data; DNA mapping rules; Support Vector Machine; bioinformatics; data mining; promoter recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems, 2008. IS '08. 4th International IEEE Conference
  • Conference_Location
    Varna
  • Print_ISBN
    978-1-4244-1739-1
  • Electronic_ISBN
    978-1-4244-1740-7
  • Type

    conf

  • DOI
    10.1109/IS.2008.4670503
  • Filename
    4670503