DocumentCode
464304
Title
Super Granular SVM Feature Elimination (Super GSVM-FE) Model for Protein Sequence Motif Informnation Extraction
Author
Chen, Bernard ; Pellicer, Stephen ; Tai, Phang C. ; Harrison, Robert ; Pan, Yi
Author_Institution
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA
fYear
2007
fDate
1-5 April 2007
Firstpage
317
Lastpage
322
Abstract
Protein sequence motifs are gathering more and more attention in the sequence analysis area. These recurring regions have the potential to determine protein´s conformation, function and activities. In our previous work, we tried to obtain protein sequence motifs which are universally conserved across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. In order to deal with large input datasets, we provided two granular computing models (FIK and FGK model) to efficiently generate protein motifs information. In this article, we develop a new method which combines the concept of granular computing and the power of ranking SVM to further extract protein sequence motif information. There are two reasons to eliminate redundant data: First, the information we try to generate is about sequence motifs, but the original input data are derived from whole protein sequences by a sliding window technique; second, during fuzzy c-means clustering, it has the ability to assign one segment to more than one information granule. However, not all data segments have a direct relation to the granule they assigned. The quality of motif information increases dramatically in all three evaluation measures by applying this new feature elimination model. Compared with traditional methods which shrink cluster size to obtain a more compact one, our approach shows improved results.
Keywords
biology computing; pattern clustering; proteins; support vector machines; fuzzy c-means clustering; granular computing models; protein sequence motif information extraction; super GSVM-FE model; super granular SVM feature elimination; Bioinformatics; Biological system modeling; Clustering algorithms; Computational biology; Computational intelligence; Data mining; Protein sequence; Sequences; Space technology; Support vector machines; FGK Model; FIK Model; Feature Elimination; Protein Sequence Motif; Ranking SVM;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on
Conference_Location
Honolulu, HI
Print_ISBN
1-4244-0710-9
Type
conf
DOI
10.1109/CIBCB.2007.4221239
Filename
4221239
Link To Document