Title :
Protein secondary structure prediction using BLAST and Relaxed Threshold Rule Induction from Coverings
Author :
Lee, Leong ; Leopold, Jennifer L. ; Frank, Ronald L.
Author_Institution :
Dept. of Comput. Sci., Univ. of North Carolina at Greensboro, Greensboro, NC, USA
Abstract :
Protein structure prediction has been a very important and challenging research problem in bioinformatics for years. Yet the determination of protein structures by time-consuming and relatively expensive experimental methods continues to lag far behind the explosive discovery of protein sequences. With the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of the best computational prediction methods has finally exceeded 80%. Herein we present a rule-based data-mining approach called BLAST-RT-RICO (Relaxed Threshold Rule Induction from Coverings) that utilizes multiple sequence alignment information to predict protein secondary structure. This method uses the PSI-BLAST algorithm to identify suitable proteins, and then generates rules from these proteins that can be used to predict secondary structure. By also utilizing known homologous template secondary structures in the Protein Data Bank (PDB) database, BLAST-RT-RICO achieved a Q3 score of 89.93% on the standard test dataset RS126 and a Q3 score of 87.71% on the standard test dataset CB396. These successful preliminary results suggest that this rule-based method may be the foundation for even more accurate prediction of protein secondary structure in the future.
Keywords :
bioinformatics; data mining; knowledge based systems; proteins; BLAST-RT-RICO; PSI-BLAST algorithm; artificial intelligence algorithms; bioinformatics; protein data bank database; protein secondary structure prediction; protein sequences; relaxed threshold rule induction from coverings; rule-based data-mining approach; sequence alignment information; Accuracy; Amino acids; Artificial neural networks; Prediction algorithms; Prediction methods; Proteins; Training; BLAST; data mining; protein secondary structure prediction;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2011 IEEE Symposium on
Conference_Location :
Paris
Print_ISBN :
978-1-4244-9896-3
DOI :
10.1109/CIBCB.2011.5948462