DocumentCode
464227
Title
Biological Sequence Clustering and Classification with a Hybrid Method and Dynamic Programming
Author
Chen, Wei-Bang ; Zhang, Chengcui ; Chen, Xin
Author_Institution
Comput. & Inf. Sci. Dept., Univ. of Alabama at Birmingham, Birmingham, AL
Volume
1
fYear
2007
fDate
21-23 May 2007
Firstpage
684
Lastpage
689
Abstract
In this paper, we report a framework for biological sequence clustering and classification. The proposed framework adopts a two-phase hybrid method for clustering, and then uses the dynamic programming technique for classification. The two-phase hybrid method combines the strengths of the hierarchical and the partition clustering. Phase I of the hybrid method uses the hierarchical agglomerative clustering to pre-cluster the aligned sequences. Phase II performs the partition clustering which initiates its partition based on the result from Phase I and uses profile Hidden Markov Models (HMMs) to represent clusters. The profile HMMs are then stored in the database for unknown sequences classification, which is done by finding the best alignment of a sequence to each existing profile HMM. However, the profile HMMs and the sequence might be different in length. The dynamic programming technique proposed in our framework can efficiently find the optimal alignment for sequences of variable lengths, which enables the evaluation of the cluster membership for any unknown sequence against fixed-length HMMs. Our experiments demonstrate the effectiveness and the efficiency of the proposed framework for biological sequence clustering and classification.
Keywords
biology computing; dynamic programming; hidden Markov models; pattern classification; pattern clustering; sequences; biological sequence classification; biological sequence clustering; database system; dynamic programming; hidden Markov model; hierarchical agglomerative clustering; hierarchical clustering; optimal alignment; partition clustering; two-phase hybrid method; Biology computing; Clustering algorithms; Clustering methods; Databases; Diseases; Drugs; Dynamic programming; Hidden Markov models; Iterative algorithms; Partitioning algorithms; Sequence clustering; classification; dynamic programming; hiearchical clustering; hybrid clustering; k-means.; partition clustering; prediction; profile HMM;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Information Networking and Applications Workshops, 2007, AINAW '07. 21st International Conference on
Conference_Location
Niagara Falls, Ont.
Print_ISBN
978-0-7695-2847-2
Type
conf
DOI
10.1109/AINAW.2007.111
Filename
4221137
Link To Document