Title :
Imbalanced Protein Data Classification Using Ensemble FTM-SVM
Author_Institution :
Sch. of Math. & Stat., Guangdong Univ. of Finance & Econ., Guangzhou, China
Abstract :
Classification of protein sequences into functional and structural families based on machine learning methods is a hot research topic in machine learning and Bioinformatics. In fact, the underlying protein classification problem is a huge multiclass problem. Generally, the multiclass problem can be reduced to a set of binary classification problems. The protein in one class are seen as positive examples while those outside the class are seen as negative examples. However, the class imbalance problem will arise in this case because the number of protein in one class is usually much smaller than that of the protein outside the class. To handle the challenge, we propose a novel framework to classify the protein. We firstly use free scores (FS) to perform feature extraction for protein; then, the inverse random under sampling (IRUS) is used to create a large number of distinct training sets; next, we use a new ensemble approach to combine these distinct training sets with a new fuzzy total margin support vector machine (FTM-SVM) that we have constructed. we call the novel ensemble classifier as ensemble fuzzy total margin support vector machine (EnFTM-SVM). We then give a full description of our method, including the details of its derivation. Finally, experimental results on fourteen benchmark protein data sets indicate that the proposed method outperforms many state-of-the-art protein classifying methods.
Keywords :
bioinformatics; feature extraction; proteins; proteomics; support vector machines; binary classification problems; bioinformatics; ensemble FTM-SVM; ensemble fuzzy total margin support vector machine; feature extraction; inverse random under sampling; machine learning methods; protein sequence classification; Feature extraction; Hidden Markov models; Noise; Protein sequence; Support vector machines; Training; Class imbalance; classification; ensemble; protein; support vector machine (SVM);
Journal_Title :
NanoBioscience, IEEE Transactions on
DOI :
10.1109/TNB.2015.2431292