Title :
A novel semi-supervised approach for protein sequence classification
Author :
Chaturvedi, Bharti ; Patil, Nagamma
Author_Institution :
Dept. of Inf. Technol., Nat. Inst. of Technol. Karnataka, Mangalore, India
Abstract :
Bioinformatics is an emerging research area. Classification of protein sequence dataset is the biggest challenge for researcher. This paper deals with supervised and semi-supervised classification of human protein sequence. Amino acid composition (AAC) used for feature extraction of the protein sequence. The classification techniques like Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbour (KNN), Random Forest, Decision Tree are using for classification of protein sequence dataset. Amongst these classifiers SVM reported the best result with higher accuracy. The limitation with SVM is that it works only with supervised(labeled dataset). It doesn´t work with unsupervised or semi-supervised dataset (unlabeled dataset or large amount of unlabeled dataset among small amount of labeled dataset). A novel semi-supervised support vector machine (SSVM) classifier is proposed which works with combination of labled and unlabled dataset. In results it observed that the proposed approach gives higher accuracy with semi-supervised dataset. Principal component analysis (PCA) used for feature reduction of protein sequence. The proposed semi-supervised support vector machine (SSVM) using PCA gives increased accuracy of about 5 to 10%.
Keywords :
Bayes methods; bioinformatics; decision trees; feature extraction; pattern classification; principal component analysis; proteins; support vector machines; AAC; KNN; PCA; SSVM classifier; amino acid composition; bioinformatics; decision tree; feature extraction; feature reduction; human protein sequence; k-nearest neighbour; naive Bayes; principal component analysis; protein sequence classification; protein sequence dataset classification; random forest; semisupervised classification; semisupervised support vector machine classifier; Accuracy; Amino acids; Feature extraction; Principal component analysis; Protein sequence; Support vector machines;
Conference_Titel :
Advance Computing Conference (IACC), 2015 IEEE International
Conference_Location :
Banglore
Print_ISBN :
978-1-4799-8046-8
DOI :
10.1109/IADCC.2015.7154885