Author/Authors :
aydilek, ibrahim berkan harran üniversitesi - mühendislik fakültesi - bilgisayar mühendisliği bölümü, Şanlıurfa, Turkey
Title Of Article :
Analyzing and improving information gain of metrics used in software defect prediction in decision trees
Abstract :
McCabe and Halstead method-level metrics are among the well-knownand widely used quantitative software metrics are used to measuresoftware quality in a concrete way. Software defect prediction can guesswhich or which of the sub-modules in the software to be developed maybe more prone to defect. Thus, loss of labor and time can be avoided. Thedatasets which are used for software defect prediction, usually have anunbalanced class distribution, since the number of records withdefective class can be fewer than the number of records with notdefective class and this situation adversely affect the results of themachine learning methods. Information gain is employed in decisiontrees and decision tree based rule classifier and attribute selectionmethods. In this study, software metrics that provide importantinformation for software defect prediction have been investigated andCM1, JM1, KC1 and PC1 datasets of NASA s PROMISE softwarerepository have been balanced with the synthetic data over-samplingSmote algorithm and improved in terms of information gain. As a result,the software defect prediction datasets with higher classificationsuccess performance and the software metrics with increasedinformation gain ratio are obtained in the decision trees.
NaturalLanguageKeyword :
Software defect prediction , Decision trees , Information gain ratio
JournalTitle :
Pamukkale University Journal Of Engineering Sciences