DocumentCode :
1365241
Title :
Improving Software-Quality Predictions With Data Sampling and Boosting
Author :
Seiffert, Chris ; Khoshgoftaar, Taghi M. ; Van Hulse, Jason
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
Volume :
39
Issue :
6
fYear :
2009
Firstpage :
1283
Lastpage :
1294
Abstract :
Software-quality data sets tend to fall victim to the class-imbalance problem that plagues so many other application domains. The majority of faults in a software system, particularly high-assurance systems, usually lie in a very small percentage of the software modules. This imbalance between the number of fault-prone (fp) and non-fp (nfp) modules can have a severely negative impact on a data-mining technique´s ability to differentiate between the two. This paper addresses the class-imbalance problem as it pertains to the domain of software-quality prediction. We present a comprehensive empirical study examining two different methodologies, data sampling and boosting, for improving the performance of decision-tree models designed to identify fp software modules. This paper applies five data-sampling techniques and boosting to 15 software-quality data sets of different sizes and levels of imbalance. Nearly 50 000 models were built for the experiments contained in this paper. Our results show that while data-sampling techniques are very effective in improving the performance of such models, boosting almost always outperforms even the best data-sampling techniques. This significant result, which, to our knowledge, has not been previously reported, has important consequences for practitioners developing software-quality classification models.
Keywords :
data mining; decision trees; software architecture; software quality; data boosting; data mining; data sampling; decision-tree models; fault-prone modules; non-fp modules; software modules; software quality data sets; software system; Binary classification; boosting; class imbalance; classification; sampling; software quality;
fLanguage :
English
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4427
Type :
jour
DOI :
10.1109/TSMCA.2009.2027131
Filename :
5233804
Link To Document :
بازگشت