Title :
A Robust Method for Biological Sequence Clustering
Author :
Chen, Wei-Bang ; Zhang, Chengcui
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Alabama at Birmingham, AL
Abstract :
In this paper, we proposed a two-phase hybrid method for biological sequence clustering, which combines the strengths of the hierarchical agglomerative clustering methods and the partition clustering methods. In phase I, the hybrid method uses the hierarchical agglomerative clustering algorithm to pre-cluster the aligned sequences, while in the second phase it takes the pre-clustering result as the initial partition for the profile hidden Markov models (HMMs) based k-means partition clustering method. Such initial partitions (generated from phase I), as against random initial partitions, are usually more reasonable and thus can avoid the inconsistency problem in the partition clustering methods due to the randomness in initial partitions. In addition, the inaccuracy of the hierarchical agglomerative clustering methods can be compensated by the profile HMM based k-means partition clustering since the latter is model-based and can better describe the dynamic properties of the data in a cluster. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed hybrid clustering algorithm
Keywords :
biology computing; hidden Markov models; molecular biophysics; pattern clustering; biological sequence clustering; hidden Markov model; hierarchical agglomerative clustering; hybrid clustering algorithm; k-means partition clustering; molecular sequence dataset; Biological system modeling; Biology computing; Buildings; Clustering algorithms; Clustering methods; Hidden Markov models; Iterative algorithms; Iterative methods; Partitioning algorithms; Robustness;
Conference_Titel :
Information Reuse and Integration, 2006 IEEE International Conference on
Conference_Location :
Waikoloa Village, HI
Print_ISBN :
0-7803-9788-6
DOI :
10.1109/IRI.2006.252427