DocumentCode :
2319247
Title :
Protein secondary structure prediction using BLAST and exhaustive RT-RICO, the search for optimal segment length and threshold
Author :
Lee, Leong ; Leopold, Jennifer L. ; Frank, Ronald L.
Author_Institution :
Dept. of Comput. Sci., Austin Peay State Univ., Clarksville, TN, USA
fYear :
2012
fDate :
9-12 May 2012
Firstpage :
35
Lastpage :
42
Abstract :
Protein secondary structure prediction from its amino acid sequence is a well studied computational problem in bioinformatics and data mining. It can be viewed as an intermediate research objective to solving the more challenging protein three-dimensional structure prediction problem, which is one of the most important research goals of bioinformatics. Although the secondary structure prediction problem was first defined in the 1960s, the prediction accuracy of the most modern methods still hovers around 80%. In [1] this research team presented a protein secondary structure prediction method, BLAST-RT-RICO (Relaxed Threshold Rule Induction from Coverings), that employs a modified association rule learning approach, utilizing multiple sequence alignment information, to predict secondary structures. Despite producing higher prediction accuracy than many other contemporary methods, that preliminary research study identified some crucial areas in need of improvements, such as determining the optimal segment length, finding the optimal threshold value, and improving the time complexity for the rule generation algorithm. In this paper, we present a modified method, BLAST-ERT-RICO (Exhaustive Relaxed Threshold Rule Induction from Coverings), which has an improved time complexity, as well as more optimal choices of segment length and threshold value. Preliminary test results showed that with a segment length of 9 amino acid residues, and a threshold value of 0.8, BLAST-ERT-RICO achieved a Q3 score of 92.19% on the standard test dataset RS126, which suggests that this approach may be even more useful as a secondary structure prediction method in the future.
Keywords :
bioinformatics; biological techniques; data mining; molecular biophysics; molecular configurations; proteins; BLAST-RT-RICO; Exhaustive Relaxed Threshold Rule Induction from Coverings; amino acid sequence; bioinformatics; data mining; exhaustive RT-RICO; modified association rule learning approach; multiple sequence alignment information; optimal segment length search; optimal threshold value; prediction accuracy; protein 3D structure prediction problem; protein secondary structure prediction; rule generation algorithm; time complexity improvement; Accuracy; Amino acids; Bismuth; Prediction algorithms; Prediction methods; Proteins; Training; BLAST; data mining; protein secondary structure prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2012 IEEE Symposium on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-1190-8
Type :
conf
DOI :
10.1109/CIBCB.2012.6217208
Filename :
6217208
Link To Document :
بازگشت