Title :
Towards simultaneous clustering and motif-modeling for a large number of protein family
Author :
Young Joon Yoo ; Sandhan, Tushar ; Jinyoung Choi ; Sun Kim
Author_Institution :
Dept. of Electr. & Comput. Eng., Seoul Nat. Univ., Seoul, South Korea
Abstract :
In this paper, we propose a novel clustering and motif modeling framework for analyzing large number of protein family using k-mer. Our approach of using k-mers utilizes both occurring frequency and position information of k-mers that essential for classification yet not fully used in previous methods. We found that the structure has close relationship between motif of protein family and hence well describe important biological features or motifs of each protein family. The classification/clustering procedure are executed in incremental manner which was difficult for previous algorithms and is modeled by using bipartite model. Furthermore, the method can be efficiently implemented using parallel computing and hash. Experimental results using the entire COG family database shows that our model can model a large number of protein families without sacrificing accuracy. In addition, the classification structure, path of the graph for protein sequences, explains characteristic subsequences or motif of each family quite well. Thus the proposed method has the potential to model both protein families and motifs, even for a large number of families.
Keywords :
biology computing; molecular biophysics; pattern classification; pattern clustering; proteins; COG family database; bipartite model; classification structure; clustering; k-mer based graph model; motif modeling framework; parallel computing; protein family; protein sequences; Biological system modeling; Bipartite graph; Computational modeling; Hidden Markov models; Mathematical model; Protein engineering; Proteins; Bipartite model; Classification; Graph; K-mer; Motif; Protein family;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732605