• DocumentCode
    464304
  • Title

    Super Granular SVM Feature Elimination (Super GSVM-FE) Model for Protein Sequence Motif Informnation Extraction

  • Author

    Chen, Bernard ; Pellicer, Stephen ; Tai, Phang C. ; Harrison, Robert ; Pan, Yi

  • Author_Institution
    Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA
  • fYear
    2007
  • fDate
    1-5 April 2007
  • Firstpage
    317
  • Lastpage
    322
  • Abstract
    Protein sequence motifs are gathering more and more attention in the sequence analysis area. These recurring regions have the potential to determine protein´s conformation, function and activities. In our previous work, we tried to obtain protein sequence motifs which are universally conserved across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. In order to deal with large input datasets, we provided two granular computing models (FIK and FGK model) to efficiently generate protein motifs information. In this article, we develop a new method which combines the concept of granular computing and the power of ranking SVM to further extract protein sequence motif information. There are two reasons to eliminate redundant data: First, the information we try to generate is about sequence motifs, but the original input data are derived from whole protein sequences by a sliding window technique; second, during fuzzy c-means clustering, it has the ability to assign one segment to more than one information granule. However, not all data segments have a direct relation to the granule they assigned. The quality of motif information increases dramatically in all three evaluation measures by applying this new feature elimination model. Compared with traditional methods which shrink cluster size to obtain a more compact one, our approach shows improved results.
  • Keywords
    biology computing; pattern clustering; proteins; support vector machines; fuzzy c-means clustering; granular computing models; protein sequence motif information extraction; super GSVM-FE model; super granular SVM feature elimination; Bioinformatics; Biological system modeling; Clustering algorithms; Computational biology; Computational intelligence; Data mining; Protein sequence; Sequences; Space technology; Support vector machines; FGK Model; FIK Model; Feature Elimination; Protein Sequence Motif; Ranking SVM;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    1-4244-0710-9
  • Type

    conf

  • DOI
    10.1109/CIBCB.2007.4221239
  • Filename
    4221239