Title :
Sequence Homology Search Based on Database Indexing Using the Profile Hidden Markov Model
Author :
Xue, Qiang ; Cole, Jeremy ; Pramanik, Sakti
Author_Institution :
Dept. of Comput. Sci. & Eng., Michigan State Univ.
Abstract :
The profile hidden Markov model (PHMM) has received increasing attention in the field of protein homology detection, since profile-based methods are much more sensitive in detecting distant homologous relationships than pairwise methods. Pure dynamic-programming-based systems are often used for PHMM searches. However, these dynamic-programming- based systems are very time consuming for a large database. For instance, it may take approximately 15 minutes to search a short model of length 12 in the GenBank protein sequence database. Instead of searching the database sequentially, we search the database based on a tree-structured database indexing, called the HD-tree. The HD-tree is able to reduce the PHMM search time significantly without reducing the quality of search results. Performance of search using the HD-tree is compared with that of HMMER, a popular implementation of PHMM for protein sequence analysis. It is shown that the HD-tree approach is orders of magnitude faster than HMMER for short queries
Keywords :
biochemistry; biology computing; database indexing; dynamic programming; hidden Markov models; molecular biophysics; proteins; tree data structures; GenBank protein sequence database; HD-tree approach; PHMM search time; distant homologous relationship; dynamic-programming-based system; profile hidden Markov model; protein homology detection; protein sequence analysis; sequence homology search; tree-structured database indexing; Bioinformatics; Computer science; Databases; Dynamic programming; Genomics; Hidden Markov models; Indexes; Indexing; Protein sequence; Read-write memory;
Conference_Titel :
BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium on
Conference_Location :
Arlington, VA
Print_ISBN :
0-7695-2727-2
DOI :
10.1109/BIBE.2006.253326