DocumentCode :
260388
Title :
Select-Bagging: Effectively Combining Gene Selection and Bagging for Balanced Bioinformatics Data
Author :
Dittman, David J. ; Khoshgoftaar, Taghi M. ; Napolitano, Amri ; Fazelpour, Alireza
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2014
fDate :
10-12 Nov. 2014
Firstpage :
413
Lastpage :
419
Abstract :
Bioinformatics datasets have historically been difficult to work with. However, within machine learning, there is a potentially effective tool to combat such problems: ensemble learning. Ensemble learning generates a series of models and combines their results to make a single decision. This process has the benefit of utilizing the power of multiple models but the overhead of having to compute the multiple models. Thus, we must ask whether the benefits outweigh the detriments. In this study, we seek to determine if the ensemble learning technique Select-Bagging improves classification results over feature selection on the training dataset followed by classification (denoted as FS-Classifier in this work) on a series of balanced bioinformatics datasets. We test the two approaches with two filter-based feature rankers, four feature subset sizes and the Naïve Bayes classifier. Our results show that Select-Bagging clearly outperforms FS-Classifier for nearly all scenarios. Subsequent statistical analysis shows that the increase in performance generated by Select-Bagging is statistically significantly better than FS-Classifier. Therefore, we can state that the inclusion of Select-Bagging is beneficial to the classification performance of models built on high-dimensional and balanced bioinformatics datasets and should be implemented. To our knowledge this is the first study which looks at the effectiveness of bagging in conjunction with internal feature selection for balanced bioinformatics datasets.
Keywords :
Bayes methods; bioinformatics; feature selection; filtering theory; genetics; learning (artificial intelligence); pattern classification; statistical analysis; FS-classifier; Naive Bayes classifier; balanced bioinformatics datasets; ensemble learning; feature selection; feature subset sizes; filter-based feature rankers; gene bagging; gene selection; machine learning; select-bagging; statistical analysis; training dataset; Bagging; Bioinformatics; Biological system modeling; Computational modeling; Data models; Measurement; Training; bagging; ensemble; feature selection; high-dimensionality;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference on
Conference_Location :
Boca Raton, FL
Type :
conf
DOI :
10.1109/BIBE.2014.66
Filename :
7033615
Link To Document :
بازگشت