• DocumentCode
    1652729
  • Title

    An Effective Data Mining Technique for the Multi-Class Protein Sequence Classification

  • Author

    Ma, Patrick C H ; Chan, Keith C C

  • Author_Institution
    Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong
  • fYear
    2008
  • Firstpage
    486
  • Lastpage
    489
  • Abstract
    One way to understand the molecular mechanism of a cell is to understand the function of each protein encoded in its genome. The function of a protein is largely dependent on the three-dimensional structure the protein assumes after folding. Since the determination of three-dimensional structure experimentally is difficult and expensive, an easier and cheaper approach is for one to look at the primary sequence of a protein and to determine its function by classifying the sequence into the corresponding functional family. In this paper, we propose an effective data mining technique for the multi-class protein sequence classification. For experimentations, the proposed technique has been tested with different sets of protein sequences. Experimental results show that it outperforms other existing protein sequence classifiers and can effectively classify proteins into their corresponding functional families.
  • Keywords
    biology computing; cellular biophysics; data mining; molecular biophysics; molecular configurations; pattern classification; proteins; sequences; cell function; data mining technique; molecular mechanism; multiclass protein sequence classification; protein encoding; protein folding; protein functional families; protein genome; three-dimensional structure; Bioinformatics; Biological system modeling; Data mining; Genomics; Hidden Markov models; Protein sequence; Support vector machine classification; Support vector machines; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedical Engineering, 2008. ICBBE 2008. The 2nd International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-1747-6
  • Electronic_ISBN
    978-1-4244-1748-3
  • Type

    conf

  • DOI
    10.1109/ICBBE.2008.118
  • Filename
    4534998