• DocumentCode
    260388
  • Title

    Select-Bagging: Effectively Combining Gene Selection and Bagging for Balanced Bioinformatics Data

  • Author

    Dittman, David J. ; Khoshgoftaar, Taghi M. ; Napolitano, Amri ; Fazelpour, Alireza

  • Author_Institution
    Florida Atlantic Univ., Boca Raton, FL, USA
  • fYear
    2014
  • fDate
    10-12 Nov. 2014
  • Firstpage
    413
  • Lastpage
    419
  • Abstract
    Bioinformatics datasets have historically been difficult to work with. However, within machine learning, there is a potentially effective tool to combat such problems: ensemble learning. Ensemble learning generates a series of models and combines their results to make a single decision. This process has the benefit of utilizing the power of multiple models but the overhead of having to compute the multiple models. Thus, we must ask whether the benefits outweigh the detriments. In this study, we seek to determine if the ensemble learning technique Select-Bagging improves classification results over feature selection on the training dataset followed by classification (denoted as FS-Classifier in this work) on a series of balanced bioinformatics datasets. We test the two approaches with two filter-based feature rankers, four feature subset sizes and the Naïve Bayes classifier. Our results show that Select-Bagging clearly outperforms FS-Classifier for nearly all scenarios. Subsequent statistical analysis shows that the increase in performance generated by Select-Bagging is statistically significantly better than FS-Classifier. Therefore, we can state that the inclusion of Select-Bagging is beneficial to the classification performance of models built on high-dimensional and balanced bioinformatics datasets and should be implemented. To our knowledge this is the first study which looks at the effectiveness of bagging in conjunction with internal feature selection for balanced bioinformatics datasets.
  • Keywords
    Bayes methods; bioinformatics; feature selection; filtering theory; genetics; learning (artificial intelligence); pattern classification; statistical analysis; FS-classifier; Naive Bayes classifier; balanced bioinformatics datasets; ensemble learning; feature selection; feature subset sizes; filter-based feature rankers; gene bagging; gene selection; machine learning; select-bagging; statistical analysis; training dataset; Bagging; Bioinformatics; Biological system modeling; Computational modeling; Data models; Measurement; Training; bagging; ensemble; feature selection; high-dimensionality;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference on
  • Conference_Location
    Boca Raton, FL
  • Type

    conf

  • DOI
    10.1109/BIBE.2014.66
  • Filename
    7033615