Title :
Protein classification using Hidden Markov models and randomised decision trees
Author :
Lacey, Arron ; Jingjing Deng ; Xianghua Xie
Author_Institution :
Dept. of Comput. Sci., Swansea Univ., Swansea, UK
Abstract :
Since the introduction of next generation sequencing there is a demand for sophisticated methods to classify proteins based on sequence data. Two main approaches for this task are to use the raw sequence data and align them against other sequences, or to extract discrete high level features from the protein sequences and compare the features. Two machine learning methods are demonstrated to show each approach. Profile Hidden Markov Models are built from multiple alignment of raw sequence data and learn amino acid emission and transition parameters for a given alignment and effectively harness the power of aligning a test protein to a model built form many proteins. Random Forests on the other hand are used to discriminate between two sets of proteins based on features such as functional amino acid groups and physiochemical properties extracted from the raw sequences. The strengths and limitations of each method are presented and discussed, focussing on the individual merits and how they could work possibly compliment each other rather than just being compared by their classification accuracy.
Keywords :
1/f noise; biochemistry; biology computing; decision trees; feature extraction; hidden Markov models; learning (artificial intelligence); molecular biophysics; molecular configurations; pattern classification; proteins; amino acid emission parameters; amino acid transition parameters; classification accuracy; discrete high level feature extraction; functional amino acid groups; machine learning methods; multiple alignment; next generation sequencing; physiochemical properties; profile hidden Markov models; protein classification; protein sequences; random forests; randomised decision trees; raw sequence data; Amino acids; Anti-freeze; Feature extraction; Hidden Markov models; Proteins; Radio frequency; Training;
Conference_Titel :
Biomedical Engineering and Informatics (BMEI), 2014 7th International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4799-5837-5
DOI :
10.1109/BMEI.2014.7002856