DocumentCode :
2135559
Title :
Side effect machines for sequence classification
Author :
Ashlock, Daniel ; Warner, Elizabeth
Author_Institution :
Math. & Stat., Univ. of Guelph, Guelph, ON
fYear :
2008
fDate :
4-7 May 2008
Abstract :
Finite state machines are routinely used to efficiently recognize patterns in strings. The internal state structure of the machine is typically only of peripheral interest, appearing in algorithms only when the number of states is minimized in the interests of efficiency of execution or comparison. A side effect machine saves information about the internal transitions of the state machine. This record of internal state transitions forms an induced feature set for the string run through the machine. In this study the number of times a machine passes though each state is used as a numerical feature set for classification. Finite state machines are trained with an evolutionary algorithm to produce feature sets that are very easy for an unsupervised learning algorithm, k-means clustering, to learn. The system is demonstrated on a collection of synthetic DNA sequences with bounded randomness. The parameters, number of states, population size, and mutation rates, are explored to characterize their effect on performance. The machines achieve perfect classification on easy examples and good classification on more difficult examples. Parameter choice has a substantial impact on performance.
Keywords :
DNA; biology computing; evolutionary computation; finite state machines; pattern classification; string matching; DNA sequence classification; evolutionary algorithm; finite state machine; internal state structure; internal state transition; k-means clustering; numerical feature set; pattern recognition; side effect machine; unsupervised learning algorithm; Automata; Chaos; Clustering algorithms; DNA; Evolutionary computation; Fractals; Mathematics; Sequences; Statistics; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2008. CCECE 2008. Canadian Conference on
Conference_Location :
Niagara Falls, ON
ISSN :
0840-7789
Print_ISBN :
978-1-4244-1642-4
Electronic_ISBN :
0840-7789
Type :
conf
DOI :
10.1109/CCECE.2008.4564782
Filename :
4564782
Link To Document :
بازگشت