• DocumentCode
    2682526
  • Title

    Splice Site Recognition in DNA Sequences Using K-mer Frequency Based Mapping for Support Vector Machine with Power Series Kernel

  • Author

    Damasevicius, R.

  • Author_Institution
    Software Eng. Dept., Kaunas Univ. of Technol., Kaunas
  • fYear
    2008
  • fDate
    4-7 March 2008
  • Firstpage
    687
  • Lastpage
    692
  • Abstract
    Recognition of specific functionally-important DNA sequence fragments is considered one of the most important problems in bioinformatics. One type of such fragments is splice-junction (intron-exon or exon-intron) sites. Detection of splice-junction sites in DNA sequences is important for successful gene prediction. In this paper, support vector machine (SVM) is used for classification of DNA sequences and splice-site recognition. For optimal classification, four position-independent k-mer frequency based methods for mapping DNA sequences into SVM feature space are analyzed. Classification is performed using SVM power series kernels. Kernel parameters are optimized using a modification of the Nelder-Mead (downhill simplex) optimization method. Precision of classification is evaluated using F-measure, which is a combination of precision and recall metrics. Best classification results are achieved using 4-mers for exon-intron dataset (78%) and 6-mers for intron-exon dataset (70%) using 4-nucleotide frequencies.
  • Keywords
    DNA; biology computing; genetics; molecular biophysics; pattern classification; support vector machines; DNA sequence; F-measure; Nelder-Mead optimization; bioinformatics; gene prediction; k-mer frequency based mapping; optimal classification; power series kernel; splice site recognition; splice-junction; support vector machine; Bioinformatics; DNA; Frequency; Genetics; Kernel; Optimization methods; Proteins; Sequences; Support vector machine classification; Support vector machines; bioinformatics; feature mapping; k-mer frequency; machine learning; splice site recognition; support vector machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Complex, Intelligent and Software Intensive Systems, 2008. CISIS 2008. International Conference on
  • Conference_Location
    Barcelona
  • Print_ISBN
    978-0-7695-3109-0
  • Type

    conf

  • DOI
    10.1109/CISIS.2008.41
  • Filename
    4606754