DocumentCode :
2767298
Title :
Neural and Statistical Classification to Families of Bio-sequences
Author :
Daoud, Mosaab ; Kremer, Stefan C.
Author_Institution :
Guelph Univ., Guelph
fYear :
0
fDate :
0-0 0
Firstpage :
699
Lastpage :
704
Abstract :
In this paper we present a novel technique to compute feature vectors for use with artificial neural networks and other pattern recognition techniques that is designed for classifying families of biological sequences. Such sequences present unique challenges due to the fact that they vary in length and often consist of many symbols relative to the number of exemplars available. The latter property presents a specific challenge with respect to avoiding over generalization. We explore a novel approach involving computing the entropy of pair-wise correlations between co-occurring symbols in the strings to generate feature vectors which are of fixed size, much smaller than the original string lengths, and still effective at discerning differences between classes of strings. We apply the technique and show its effectiveness on an RNA family classification problem.
Keywords :
biology computing; entropy; molecular biophysics; molecular configurations; neural nets; pattern classification; RNA family classification; artificial neural networks; bio-sequences; entropy; feature vectors; neural classification; pair-wise correlations; pattern recognition; statistical classification; Artificial neural networks; Biological information theory; Biology computing; Computational Intelligence Society; Data mining; Encoding; Feature extraction; Frequency; Pattern recognition; RNA;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2006. IJCNN '06. International Joint Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9490-9
Type :
conf
DOI :
10.1109/IJCNN.2006.246752
Filename :
1716163
Link To Document :
بازگشت