Title :
Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier
Author :
Hu, Hae-Jin ; Pan, Yi ; Harrison, Robert ; Tai, Phang C.
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
Abstract :
Prediction of protein secondary structures is an important problem in bioinformatics and has many applications. The recent trend of secondary structure prediction studies is mostly based on the neural network or the support vector machine (SVM). The SVM method is a comparatively new learning system which has mostly been used in pattern recognition problems. In this study, SVM is used as a machine learning tool for the prediction of secondary structure and several encoding schemes, including orthogonal matrix, hydrophobicity matrix, BLOSUM62 substitution matrix, and combined matrix of these, are applied and optimized to improve the prediction accuracy. Also, the optimal window length for six SVM binary classifiers is established by testing different window sizes and our new encoding scheme is tested based on this optimal window size via sevenfold cross validation tests. The results show 2% increase in the accuracy of the binary classifiers when compared with the instances in which the classical orthogonal matrix is used. Finally, to combine the results of the six SVM binary classifiers, a new tertiary classifier which combines the results of one-versus-one binary classifiers is introduced and the performance is compared with those of existing tertiary classifiers. According to the results, the Q3 prediction accuracy of new tertiary classifier reaches 78.8% and this is better than the best result reported in the literature.
Keywords :
biology computing; encoding; learning (artificial intelligence); molecular biophysics; molecular configurations; proteins; support vector machines; BLOSUM62 substitution matrix; advanced tertiary classifier; bioinformatics; encoding; hydrophobicity matrix; improved protein secondary structure prediction; learning system; machine learning tool; neural network; orthogonal matrix; pattern recognition; support vector machine; Accuracy; Bioinformatics; Encoding; Learning systems; Matrices; Neural networks; Proteins; Support vector machine classification; Support vector machines; Testing; BLOSUM62; Binary classifier; Position Specific Scoring Matrix (PSSM); encoding scheme; orthorgonal matrix; support vector machine (SVM); tertiary classifier; Algorithms; Amino Acid Sequence; Artificial Intelligence; Models, Chemical; Models, Molecular; Molecular Sequence Data; Protein Conformation; Protein Structure, Secondary; Protein Structure, Tertiary; Proteins; Sequence Alignment; Sequence Analysis, Protein;
Journal_Title :
NanoBioscience, IEEE Transactions on
DOI :
10.1109/TNB.2004.837906