DocumentCode :
464227
Title :
Biological Sequence Clustering and Classification with a Hybrid Method and Dynamic Programming
Author :
Chen, Wei-Bang ; Zhang, Chengcui ; Chen, Xin
Author_Institution :
Comput. & Inf. Sci. Dept., Univ. of Alabama at Birmingham, Birmingham, AL
Volume :
1
fYear :
2007
fDate :
21-23 May 2007
Firstpage :
684
Lastpage :
689
Abstract :
In this paper, we report a framework for biological sequence clustering and classification. The proposed framework adopts a two-phase hybrid method for clustering, and then uses the dynamic programming technique for classification. The two-phase hybrid method combines the strengths of the hierarchical and the partition clustering. Phase I of the hybrid method uses the hierarchical agglomerative clustering to pre-cluster the aligned sequences. Phase II performs the partition clustering which initiates its partition based on the result from Phase I and uses profile Hidden Markov Models (HMMs) to represent clusters. The profile HMMs are then stored in the database for unknown sequences classification, which is done by finding the best alignment of a sequence to each existing profile HMM. However, the profile HMMs and the sequence might be different in length. The dynamic programming technique proposed in our framework can efficiently find the optimal alignment for sequences of variable lengths, which enables the evaluation of the cluster membership for any unknown sequence against fixed-length HMMs. Our experiments demonstrate the effectiveness and the efficiency of the proposed framework for biological sequence clustering and classification.
Keywords :
biology computing; dynamic programming; hidden Markov models; pattern classification; pattern clustering; sequences; biological sequence classification; biological sequence clustering; database system; dynamic programming; hidden Markov model; hierarchical agglomerative clustering; hierarchical clustering; optimal alignment; partition clustering; two-phase hybrid method; Biology computing; Clustering algorithms; Clustering methods; Databases; Diseases; Drugs; Dynamic programming; Hidden Markov models; Iterative algorithms; Partitioning algorithms; Sequence clustering; classification; dynamic programming; hiearchical clustering; hybrid clustering; k-means.; partition clustering; prediction; profile HMM;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Information Networking and Applications Workshops, 2007, AINAW '07. 21st International Conference on
Conference_Location :
Niagara Falls, Ont.
Print_ISBN :
978-0-7695-2847-2
Type :
conf
DOI :
10.1109/AINAW.2007.111
Filename :
4221137
Link To Document :
بازگشت