• DocumentCode
    2451
  • Title

    An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data

  • Author

    Hualong Yu ; Jun Ni

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Jiangsu Univ. of Sci. & Technol., Zhenjiang, China
  • Volume
    11
  • Issue
    4
  • fYear
    2014
  • fDate
    July-Aug. 2014
  • Firstpage
    657
  • Lastpage
    666
  • Abstract
    Training classifiers on skewed data can be technically challenging tasks, especially if the data is high-dimensional simultaneously, the tasks can become more difficult. In biomedicine field, skewed data type often appears. In this study, we try to deal with this problem by combining asymmetric bagging ensemble classifier (asBagging) that has been presented in previous work and an improved random subspace (RS) generation strategy that is called feature subspace (FSS). Specifically, FSS is a novel method to promote the balance level between accuracy and diversity of base classifiers in asBagging. In view of the strong generalization capability of support vector machine (SVM), we adopt it to be base classifier. Extensive experiments on four benchmark biomedicine data sets indicate that the proposed ensemble learning method outperforms many baseline approaches in terms of Accuracy, F-measure, G-mean and AUC evaluation criterions, thus it can be regarded as an effective and efficient tool to deal with high-dimensional and imbalanced biomedical data.
  • Keywords
    medical computing; pattern classification; random processes; support vector machines; AUC evaluation criterions; F-measure; G-mean; SVM; asBagging; asymmetric bagging ensemble classifier; benchmark biomedicine data sets; feature subspace; generalization capability; high-dimensional classification; imbalanced biomedicine data classification; improved ensemble learning method; random subspace generation; skewed data; support vector machine; training classifiers; Bioinformatics; Cancer; Feature extraction; Frequency selective surfaces; Support vector machines; Training; Bioinformatics; class imbalance; ensemble learning; high-dimensional biomedicine data;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2306838
  • Filename
    6747343