DocumentCode :
1398667
Title :
Prediction of membrane protein types by using dipeptide and pseudo amino acid composition-based composite features
Author :
Hayat, M. ; Khan, Ajmal
Author_Institution :
DCIS, Pakistan Inst. of Eng. & Appl. Sci., Islamabad, Pakistan
Volume :
6
Issue :
18
fYear :
2012
Firstpage :
3257
Lastpage :
3264
Abstract :
Membrane proteins are fundamental elements of a cell that play essential roles nearly in all the cellular processes. Prediction of membrane protein types using biological experiments are often complicated and time consuming. Therefore it is highly desirable to develop a robust, reliable and high-throughput silico method to predict membrane protein types. In this study, the authors have used two feature extraction strategies known as dipeptide and pseudo amino acid (PseAA) compositions for classification of membrane proteins types. In addition, a composite model is also developed by concatenating dipeptide and PseAA composition based features. Further, two feature selection methods such as neighbourhood preserving embedding and locally linear embedding (LLE) are applied to reduce the dimensionality of the composite model. The performance of these feature extraction strategies is evaluated using four different classifiers: K-nearest neighbour, probabilistic neural network (PNN), support vector machine (SVM) and grey incidence degree. The highest success rates have been observed using the LLE-based reduced features. SVM has yielded the best accuracy of 88.2% in case of jackknife test. Although in case of independent dataset test, PNN has obtained the highest accuracy of 98.4%. Performance measures other than accuracy are also used such as ´Mathew correlation coefficient´, sensitivity and precision. The authors simulated results show that the composite model has significantly discriminated the types of membrane protein and might be useful for future research and drug discovery.
Keywords :
biology computing; biomembranes; cellular biophysics; feature extraction; molecular biophysics; neural nets; proteins; support vector machines; K-nearest neighbour; Mathew correlation coefficient´; PNN; PseAA composition; SVM; cellular processes; dipeptide; feature extraction; grey incidence degree; locally linear embedding; membrane protein type prediction; probabilistic neural network; pseudo amino acid; support vector machine;
fLanguage :
English
Journal_Title :
Communications, IET
Publisher :
iet
ISSN :
1751-8628
Type :
jour
DOI :
10.1049/iet-com.2011.0170
Filename :
6412956
Link To Document :
بازگشت