DocumentCode :
1300526
Title :
Structural SCOP Superfamily Level Classification Using Unsupervised Machine Learning
Author :
Angadi, U.B. ; Venkatesulu, M.
Author_Institution :
Nat. Inst. of Animal Nutrition & Physiol., Kalasalingam Univ., Bangalore, India
Volume :
9
Issue :
2
fYear :
2012
Firstpage :
601
Lastpage :
608
Abstract :
One of the major research directions in bioinformatics is that of assigning superfamily classification to a given set of proteins. The classification reflects the structural, evolutionary, and functional relatedness. These relationships are embodied in a hierarchical classification, such as the Structural Classification of Protein (SCOP), which is mostly manually curated. Such a classification is essential for the structural and functional analyses of proteins. Yet a large number of proteins remain unclassified. In this study, we have proposed an unsupervised machine learning approach to classify and assign a given set of proteins to SCOP superfamilies. In the method, we have constructed a database and similarity matrix using P-values obtained from an all-against-all BLAST run and trained the network with the ART2 unsupervised learning algorithm using the rows of the similarity matrix as input vectors, enabling the trained network to classify the proteins from 0.82 to 0.97 f-measure accuracy. The performance of ART2 has been compared with that of spectral clustering, Random forest, SVM, and HHpred. ART2 performs better than the others except HHpred. HHpred performs better than ART2 and the sum of errors is smaller than that of the other methods evaluated.
Keywords :
bioinformatics; matrix algebra; molecular biophysics; proteins; proteomics; unsupervised learning; ART2 unsupervised learning; HHpred; Random forest; SVM; Structural Classification of Protein; bioinformatics; proteins; similarity matrix; spectral clustering; structural SCOP superfamily level classification; unsupervised machine learning; Bioinformatics; Databases; Hidden Markov models; Matrices; Proteins; Support vector machines; Training; ART2 neural network; Protein classification; SCOP; unsupervised learning.; Algorithms; Cluster Analysis; Computational Biology; Models, Statistical; Neural Networks (Computer); Pattern Recognition, Automated; Protein Structure, Tertiary; Proteins; Sequence Analysis, Protein;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2011.114
Filename :
5989791
Link To Document :
بازگشت