Title :
Classification of cohesin family using class specific motifs
Author :
Eser, Ercument M. ; Arslan, Burak R. ; Sezerman, Ugur O.
Author_Institution :
Comput. Eng., Galatasaray Univ., Istanbul, Turkey
Abstract :
Motif extraction from protein sequences has been a challenging task for bioinformaticians. Class-specific motifs, which are frequently found in one class but are in small ratio in other classes can be used for highly accurate classification of protein sequences. In this study, we present a new scoring based method for class-specific n-gram motif selection using reduced amino acid alphabets. Cohesin protein sequences, which interact with Dockerin modules to construct the most common and abundant organic polymer Cellulosome is used for class specific motif selection, and selected motifs are then given to J48 and SVM algorithms as features. Results of classification are examined with parameters of various n-gram sizes, reduced amino acid alphabets and feature number. Result with training accuracy of 98.61 % and test accuracy of 94.54 %, was found to be best one using Gbmr14 alphabet, 5 features per family, 4-gram motifs and J48 algorithm. The proposed technique can be generalized to use for other protein families.
Keywords :
bioinformatics; pattern classification; proteins; support vector machines; Dockerin modules; Gbmr14 alphabet; J48 algorithms; SVM algorithms; bioinformaticians; cellulosome; class specific motifs; class-specific n-gram motif selection; cohesin family classification; cohesin protein sequences; feature number; motif extraction; n-gram sizes; organic polymer; protein families; reduced amino acid alphabets; scoring based method; Abstracts; Accuracy; Amino acids; Bioinformatics; Classification algorithms; Proteins; Training; Class-specific motifs; Cohesin; Protein Classification; Reduced Amino Acid Alphabets; n-gram;
Conference_Titel :
Health Informatics and Bioinformatics (HIBIT), 2013 8th International Symposium on
Conference_Location :
Ankara
Print_ISBN :
978-1-4799-0700-7
DOI :
10.1109/HIBIT.2013.6661687