DocumentCode :
2009558
Title :
A Hidden Markov Model Approach to Identifying HTH Motifs Using Protein Sequence and Predicted Solvent Accessibility
Author :
Yan, Changhui ; Hu, Jing
Author_Institution :
Comput. Sci. Dept., Utah State Univ., Logan, UT
fYear :
2006
fDate :
28-29 Sept. 2006
Firstpage :
1
Lastpage :
7
Abstract :
This paper presents a hidden Markov model method (referred as HMM_AA_SA) for the identification of Helix-Turn-Helix (HTH) DNA-binding motifs. The method takes amino acid sequence and predicted solvent accessibility as input. Solvent accessibility of amino acids is predicted from amino acid sequence and discretized into three categories: buried (B), medium (M) and exposed (E). At each state, HMM_AA_SA emits not only one letter of amino acid but also one letter of solvent accessibility. The method is evaluated using 12 families of HTH motifs from the Pfam. Hidden Markov models are built and tested for each family individually based on three-fold cross validations. The results show that adding predicted solvent accessibility into the model increases the sensitivity by 5.7%, reaching 94.9%. We explore several reduced alphabets of amino acids in order to reduce the complexity of protein sequences and reduce the number of parameters in the model. The results show that using reduced alphabets can not only reduce the number of parameters in the system but also improve the performance. One interesting discovery is that HMM_AA_SA built from a HTH family can identify HTH motifs from other families, suggesting that the HMM_SA_AA method can capture features shared by different families of HTH motifs. This ability is improved when the hidden Markov models are built from the sequence fragments directly involved in the HTH motifs
Keywords :
hidden Markov models; molecular configurations; proteins; sequences; HMM_AA_SA; HTH motifs; Helix-Turn-Helix DNA-binding motifs; amino acid sequence; cross validations; hidden Markov model; protein sequence; solvent accessibility; Amino acids; Computer architecture; Computer science; DNA; Hidden Markov models; Predictive models; Protein sequence; Sequences; Solvents; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on
Conference_Location :
Toronto, Ont.
Print_ISBN :
1-4244-0623-4
Electronic_ISBN :
1-4244-0624-2
Type :
conf
DOI :
10.1109/CIBCB.2006.331005
Filename :
4133147
Link To Document :
بازگشت