DocumentCode :
1031736
Title :
Designing Patterns and Profiles for Faster HMM Search
Author :
Sun, Yanni ; Buhler, Jeremy
Author_Institution :
Dept. of Comput. Sci. & Eng., Washington Univ., St. Louis, MO
Volume :
6
Issue :
2
fYear :
2009
Firstpage :
232
Lastpage :
243
Abstract :
Profile HMMs are powerful tools for modeling conserved motifs in proteins. They are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism´s proteome. It is highly desirable to speed up HMM search in large databases. We design PROSITE-like patterns and short profiles that are used as filters to rapidly eliminate protein-motif pairs for which a full profile HMM comparison does not yield a significant match. The design of the pattern-based filters is formulated as a multichoice knapsack problem. Profile-based filters with high sensitivity are extracted from a profile HMM based on their theoretical sensitivity and false positive rate. Experiments show that our profile-based filters achieve high sensitivity (near 100 percent) while keeping around 20times speedup with respect to the unfiltered search program. Pattern-based filters typically retain at least 90 percent of the sensitivity of the source HMM with 30-40times speedup. The profile-based filters have sensitivity comparable to the multistage filtering strategy HMMERHEAD and are faster in most of our experiments.
Keywords :
bioinformatics; hidden Markov models; macromolecules; proteins; proteomics; search problems; HMM search; HMMERHEAD; PROSITE-like patterns; hidden Markov models; multichoice knapsack problem; multistage filtering strategy; pattern-based filters; profile-based filters; protein sequences; protein-motif pairs; proteins; proteome; proteomic sequence data; Biology and genetics; Pfam; bioinformatics databases; filtration; hidden Markov models.; pattern match; profile; profile hidden Markov model; sequence similarity search; Algorithms; Amino Acid Motifs; Amino Acid Sequence; Conserved Sequence; Databases, Protein; Markov Chains; Pattern Recognition, Automated; Proteins; Sensitivity and Specificity; Sequence Analysis, Protein; Sequence Homology, Amino Acid;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2008.14
Filename :
4429177
Link To Document :
بازگشت