DocumentCode :
522824
Title :
Nearest neighbor training of side effect machines for sequence classification
Author :
Ashlock, Daniel ; McEachern, Andrew
Author_Institution :
Dept. of Math. & Stat., Univ. of Guelph, Guelph, ON, Canada
fYear :
2010
fDate :
2-5 May 2010
Firstpage :
1
Lastpage :
8
Abstract :
Side effect machines operate by associating side effects with the states of a finite state machine. The use of side effect machines permits the researcher to leverage information stored in the state transition structure, making machines that might be identical as recognizers behave differently as classifiers. The side effect machines in this study associate a counter with each state so that the number of times each state is visited becomes a numerical feature associated with each state. The key to effective use of these numerical feature is to locate side effect machines for which the count vectors are good feature sets. In this study side effect machines are selected with an evolutionary algorithm. The Rand index of nearest neighbor classification of the count vectors serves as the fitness function for selecting side effect machines. A parameter study is performed on simple synthetic data and then side effect machines are trained to classify two sets of biological sequences. The first set comprises two categories of HLA sequences from the human major histocompatibility complex. The second are positive and negative examples of human endogenous retroviral sequences taken from the human genome. The retroviral sequences are challenging but good results are obtained. The HLA data is classified with complete accuracy.
Keywords :
biology computing; evolutionary computation; finite state machines; genomics; learning (artificial intelligence); pattern classification; Rand index; biological sequences; count vectors; evolutionary algorithm; finite state machine; fitness function; human endogenous retroviral sequences; human genome; human major histocompatibility complex; nearest neighbor training; sequence classification; side effect machines; state transition structure; synthetic data; Automata; Bioinformatics; Clustering algorithms; DNA; Evolution (biology); Evolutionary computation; Genomics; Humans; Nearest neighbor searches; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
Conference_Location :
Montreal, QC
Print_ISBN :
978-1-4244-6766-2
Type :
conf
DOI :
10.1109/CIBCB.2010.5510426
Filename :
5510426
Link To Document :
بازگشت