DocumentCode :
464304
Title :
Super Granular SVM Feature Elimination (Super GSVM-FE) Model for Protein Sequence Motif Informnation Extraction
Author :
Chen, Bernard ; Pellicer, Stephen ; Tai, Phang C. ; Harrison, Robert ; Pan, Yi
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA
fYear :
2007
fDate :
1-5 April 2007
Firstpage :
317
Lastpage :
322
Abstract :
Protein sequence motifs are gathering more and more attention in the sequence analysis area. These recurring regions have the potential to determine protein´s conformation, function and activities. In our previous work, we tried to obtain protein sequence motifs which are universally conserved across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. In order to deal with large input datasets, we provided two granular computing models (FIK and FGK model) to efficiently generate protein motifs information. In this article, we develop a new method which combines the concept of granular computing and the power of ranking SVM to further extract protein sequence motif information. There are two reasons to eliminate redundant data: First, the information we try to generate is about sequence motifs, but the original input data are derived from whole protein sequences by a sliding window technique; second, during fuzzy c-means clustering, it has the ability to assign one segment to more than one information granule. However, not all data segments have a direct relation to the granule they assigned. The quality of motif information increases dramatically in all three evaluation measures by applying this new feature elimination model. Compared with traditional methods which shrink cluster size to obtain a more compact one, our approach shows improved results.
Keywords :
biology computing; pattern clustering; proteins; support vector machines; fuzzy c-means clustering; granular computing models; protein sequence motif information extraction; super GSVM-FE model; super granular SVM feature elimination; Bioinformatics; Biological system modeling; Clustering algorithms; Computational biology; Computational intelligence; Data mining; Protein sequence; Sequences; Space technology; Support vector machines; FGK Model; FIK Model; Feature Elimination; Protein Sequence Motif; Ranking SVM;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0710-9
Type :
conf
DOI :
10.1109/CIBCB.2007.4221239
Filename :
4221239
Link To Document :
بازگشت