DocumentCode :
2834491
Title :
Application of neural networks for protein sequence classification
Author :
Sharma, Sameer ; Kumar, Vinod ; Rani, T. Sobha ; Bhavani, S. Durga ; Raju, S. Bapi
Author_Institution :
Dept. of Comput. & Inf. Sci., Hyderabad Univ., India
fYear :
2004
fDate :
2004
Firstpage :
325
Lastpage :
328
Abstract :
Protein sequence classification is modelled as a binary classification problem where an unlabeled protein sequence is checked to see if it belongs to a known set of protein superfamilies or not. In this paper we used multilayer perceptrons with supervised learning algorithm to learn the binary classification. The training data consists of two sets-a positive set belonging to an identified set of protein superfamily and a negative set comprising sequences from other superfamilies. When applying neural networks the first problem to be addressed is feature extraction. In this paper we used the new feature extraction techniques proposed by Wang et al. Simulations reveal that the neural network is able to classify with good precision for myosin and photochrome superfamilies in the data set that we have chosen as positive. Also the results for globin superfamily are good, thus validating the methodology of feature extraction and the application of neural networks for protein sequence classification as suggested by Wang et al. But, for Actin and Ribonuclease superfamilies the network showed poor performance. One possible reason for this may be that the choice of sequences in the negative data set is not optimal. We conclude from this work that the classification performance depends upon a proper selection of sequences for positive and negative data sets.
Keywords :
feature extraction; learning (artificial intelligence); molecular biophysics; multilayer perceptrons; pattern classification; proteins; actin; binary classification; data sets; feature extraction; globin superfamily; multilayer perceptrons; myosin superfamily; neural networks; photochrome superfamily; protein sequence classification; protein superfamily; ribonuclease; supervised learning algorithm; training data; Amino acids; Application software; Databases; Feature extraction; Frequency measurement; Hidden Markov models; Neural networks; Protein sequence; Training data; Viterbi algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Sensing and Information Processing, 2004. Proceedings of International Conference on
Print_ISBN :
0-7803-8243-9
Type :
conf
DOI :
10.1109/ICISIP.2004.1287676
Filename :
1287676
Link To Document :
بازگشت