• DocumentCode
    3335336
  • Title

    An Empirical Study of the Classification Performance of Learners on Imbalanced and Noisy Software Quality Data

  • Author

    Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason ; Folleco, Andres

  • Author_Institution
    Florida Atlantic Univ., Boca Raton
  • fYear
    2007
  • fDate
    13-15 Aug. 2007
  • Firstpage
    651
  • Lastpage
    658
  • Abstract
    In the domain of software quality classification, data mining techniques are used to construct models (learners) for identifying software modules that are most likely to be fault-prone. The performance of these models, however, can be negatively affected by class imbalance and noise. Data sampling techniques have been proposed to alleviate the problem of class imbalance, but the impact of data quality on these techniques has not been adequately addressed. We examine the combined effects of noise and imbalance on classification performance when seven commonly-used sampling techniques are applied to software quality measurement data. Our results show that some sampling techniques are more robust in the presence of noise than others. Further, sampling techniques are affected by noise differently given different levels of imbalance.
  • Keywords
    data mining; sampling methods; software quality; data mining techniques; data sampling techniques; software quality data; Data mining; Fault diagnosis; Noise level; Noise measurement; Sampling methods; Software engineering; Software measurement; Software performance; Software quality; Software systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration, 2007. IRI 2007. IEEE International Conference on
  • Conference_Location
    Las Vegas, IL
  • Print_ISBN
    1-4244-1500-4
  • Electronic_ISBN
    1-4244-1500-4
  • Type

    conf

  • DOI
    10.1109/IRI.2007.4296694
  • Filename
    4296694