• DocumentCode
    589292
  • Title

    A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction

  • Author

    Kehan Gao ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio

  • Author_Institution
    Eastern Connecticut State Univ., Willimantic, CT, USA
  • Volume
    2
  • fYear
    2012
  • fDate
    12-15 Dec. 2012
  • Firstpage
    281
  • Lastpage
    288
  • Abstract
    High dimensionality and class imbalance are the two main problems affecting many software defect prediction. In this paper, we propose a new technique, named SelectRUSBoost, which is a form of ensemble learning that in-corporates data sampling to alleviate class imbalance and feature selection to resolve high dimensionality. To evaluate the effectiveness of the new technique, we apply it to a group of datasets in the context of software defect prediction. We employ two classification learners and six feature selection techniques. We compare the technique to the approach where feature selection and data sampling are used together, as well as the case where feature selection is used alone (no sampling used at all). The experimental results demonstrate that the SelectRUSBoost technique is more effective in improving classification performance compared to the other approaches.
  • Keywords
    data handling; learning (artificial intelligence); software engineering; SelectRUSBoost; class imbalance; data sampling; ensemble learning; high dimensionality; software defect prediction; Boosting; Data models; Measurement; Prediction algorithms; Predictive models; Software; Support vector machines; class imbalance; high dimensionality; software defect prediction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2012 11th International Conference on
  • Conference_Location
    Boca Raton, FL
  • Print_ISBN
    978-1-4673-4651-1
  • Type

    conf

  • DOI
    10.1109/ICMLA.2012.145
  • Filename
    6406710