مرکز منطقه ای اطلاع رساني علوم و فناوري - Evolved Features for DNA Sequence Classification and Their Fitness Landscapes

DocumentCode :

22856

Title :

Evolved Features for DNA Sequence Classification and Their Fitness Landscapes

Author :

Ashlock, Wendy ; Datta, Soupayan

Author_Institution :

Department ofComputer Science and Engineering, York University, Toronto, Canada

Volume :

Issue :

fYear :

2013

fDate :

Apr-13

Firstpage :

185

Lastpage :

197

Abstract :

A key problem in genomics is the classification and annotation of sequences in a genome. A major challenge is identifying good sequence features. Evolutionary algorithms have the potential to search a large space of features and automatically generate useful ones. This paper proposes a two-stage method that generates features using multiple replicates of a genetic algorithm operating on an augmented finite state machine, called a side effect machine (SEM), and then selects a small diverse feature set using several methods, including a novel method called dissimilarity clustering. We apply our method to three problems related to transposable elements and compare the results to those using $k$ -mer features. We are able to produce a small set of interesting and comprehensible features that create random forest classifiers more accurate and less prone to overfitting than those created using $k$ -mer features. We analyze the SEM fitness landscapes and discuss the use of different fitness functions.

Keywords :

Bioinformatics; DNA; Genetic algorithms; Genomics; Microwave integrated circuits; Training; Automatic feature generation; DNA sequence classification; clustering; fitness landscape; side effect machines (SEMs);

fLanguage :

English

Journal_Title :

Evolutionary Computation, IEEE Transactions on

Publisher :

ieee

ISSN :

1089-778X

Type :

jour

DOI :

10.1109/TEVC.2012.2207120

Filename :

6232454

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=22856