Title :
FIK Model: Novel Efficient Granular Computing Model for Protein Sequence Motifs and Structure Information Discovery
Author :
Chen, Bernard ; Tai, Phang C. ; Harrison, Robert ; Pan, Yi
Author_Institution :
Comput. Sci. Dept., Georgia State Univ., Atlanta, GA
Abstract :
Protein sequence motifs information is very important to the analysis of biologically significant regions. The conserved regions have the potential to determine the conformation, function and activities of the proteins. The main purpose of this paper is trying to obtain protein sequence motifs which are universally conserved and across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is demanded. In this article, short recurring segments of proteins are explored by utilizing a novel granular computing strategy. First, fuzzy C-means clustering algorithm (FCM) is used to separate the whole dataset into several smaller informational granules and then succeeded by improved K-means clustering algorithm on each granule to obtain the final results. The structural similarity of the clusters discovered by our approach is studied to analyze how the recurring patterns correlate with its structure. Also, some biochemical references are included in our evaluation. To the best of our knowledge, it is the first time that the granular computing concept as well as the DBI measure for evaluation is introduced to this dataset. Compare with the latest research results, our method requires only twenty percent of the execution time and obtains even higher quality information of protein sequence motifs. The efficient and satisfactory results in our experiment suggests that our granular computing model which combined FCM and improved K-means may have a high chance to be applied in some other bioinformatics research fields and yield stunning results
Keywords :
biochemistry; biological techniques; biology computing; data mining; fuzzy set theory; molecular biophysics; molecular configurations; pattern clustering; proteins; FIK model; K-means clustering algorithm; biochemical references; bioinformatics; fuzzy C-means clustering algorithm; granular computing model; protein conformation; protein sequence motifs; protein structure information discovery; recurring pattern correlation; Bioinformatics; Biological system modeling; Biology computing; Clustering algorithms; Fingerprint recognition; Pattern analysis; Predictive models; Protein engineering; Protein sequence; Sequences; FIK Model; Fuzzy C-Means Clustering; Improved K-means clustering; Sequence Motif.;
Conference_Titel :
BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium on
Conference_Location :
Arlington, VA
Print_ISBN :
0-7695-2727-2
DOI :
10.1109/BIBE.2006.253311