DocumentCode :
1497732
Title :
An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and Its Application to DNA Splice Site Prediction
Author :
Kamath, Uday ; Compton, Jack ; Islamaj-Dogan, Rezarta ; De Jong, Kenneth A. ; Shehu, Amarda
Author_Institution :
Dept. of Comput. Sci., George Mason Univ., Ashburn, VA, USA
Volume :
9
Issue :
5
fYear :
2012
Firstpage :
1387
Lastpage :
1398
Abstract :
Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches.
Keywords :
DNA; biological techniques; evolutionary computation; genetic algorithms; molecular biophysics; support vector machines; DNA splice site prediction; biological sequence data; evolutionary algorithm approach; feature generation; gene-finding problem; genetic programming; machine learning methods; state-of-the-art approach; support vector machines; Accuracy; Bioinformatics; DNA; Prediction algorithms; Support vector machines; Training data; DNA splice sites.; Evolutionary computation; classifier design and evaluation; data mining; feature extraction and construction; genetic programming; Algorithms; Computational Biology; DNA; Pattern Recognition, Automated; RNA Splicing; Sequence Analysis, DNA;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2012.53
Filename :
6185531
Link To Document :
بازگشت