مرکز منطقه ای اطلاع رساني علوم و فناوري - Exhaustive RT-RICO algorithm for mining association rules in protein secondary structure

DocumentCode :

2319774

Title :

Exhaustive RT-RICO algorithm for mining association rules in protein secondary structure

Author :

Lee, Leong ; Leopold, Jennifer L. ; Frank, Ronald L.

Author_Institution :

Dept. of Comput. Sci., Austin Peay State Univ., Clarksville, TN, USA

fYear :

2012

fDate :

9-12 May 2012

Firstpage :

260

Lastpage :

266

Abstract :

Prediction of a protein´s secondary structure from its amino acid sequence is a well studied computational problem in bioinformatics, and has significant practical research value. Although the secondary structure prediction problem was first defined almost fifty years ago, the accuracy of most modern methods still hovers around 80%. In [1] this research team presented a promising protein secondary structure prediction method, BLAST-RT-RICO (Relaxed Threshold Rule Induction from Coverings), that employs a modified association rule learning approach, utilizing multiple sequence alignment information. BLAST-RT-RICO achieved Q₃ scores of 89.93% and 87.71% on the standard test datasets RS126 and CB396, respectively. However, there were some areas of the algorithm that were in need of improvement; most importantly, the time complexity for the rule generation step needed to be reduced. Recently, we developed a modified rule generation algorithm, ERT-RICO (Exhaustive Relaxed Threshold Rule Induction from Coverings), that addresses this issue. The research team now is able to run much larger test datasets with different choices of segment length and threshold value; preliminary test results achieved a Q₃ score of 92.19% on the standard test dataset RS126. The modified algorithm, its mathematical definitions, and the improved time/space complexity are discussed in this paper.

Keywords :

bioinformatics; biological techniques; data mining; molecular biophysics; molecular configurations; proteins; BLAST-RT-RICO; ERT-RICO; Exhaustive Relaxed Threshold Rule Induction from Coverings; amino acid sequence; association rule mining; bioinformatics; exhaustive RT-RICO algorithm; modified association rule learning approach; multiple sequence alignment information; protein secondary structure prediction problem; rule generation step time complexity; Accuracy; Amino acids; Association rules; Complexity theory; Prediction algorithms; Prediction methods; Proteins; association rule mining; data mining; protein secondary structure prediction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2012 IEEE Symposium on

Conference_Location :

San Diego, CA

Print_ISBN :

978-1-4673-1190-8

Type :

conf

DOI :

10.1109/CIBCB.2012.6217239

Filename :

6217239

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2319774