Title :
Discretization methods for NBC in effort estimation: An empirical comparison based on ISBSG projects
Author :
Fernandez-Diego, M. ; Torralba-Martinez, J.-M.
Author_Institution :
Dept. of Bus. Adm., Univ. Politec. de Valencia, Valencia, Spain
Abstract :
Background: Bayesian networks have been applied in many fields, including effort estimation in software engineering. Even though there are Bayesian inference algorithms than can handle continuous variables, performance tends to be better when these variables are discretized that when they are assumed to follow a specific distribution. On the other hand, the choice of the discretization method and the number of discretized intervals may lead to significantly different estimating results. However, discretization issues are seldom mentioned in software engineering effort estimation models. Aim: This paper seeks to show that discretization issues are important in terms of prediction accuracy while building a Naive Bayes Classifier (NBC) for estimating software effort. Method: For this purpose, a NBC model has been developed for software effort estimation based on ISBSG projects applying different discretization schemes (equal width intervals, equal frequency intervals, and k-means clustering) and using different number of intervals. Results: Regarding the NBC model built, the estimation accuracy of equal frequency discretization is only improved by k-means clustering with respect to Pred(0.25), although it reflects better the original distribution. Conclusions: Further experimentation should determine the potential of clustering methods already highlighted in other fields.
Keywords :
belief networks; pattern classification; pattern clustering; project management; software cost estimation; Bayesian inference algorithm; Bayesian network; ISBSG project; NBC model; cost estimation; discretization method; discretized interval; equal frequency discretization; equal frequency interval; equal width interval; k-means clustering; naive Bayes classifier; prediction accuracy; software effort estimation; software engineering; software project; Accuracy; Bayes methods; Computational modeling; Estimation; Productivity; Software; Software engineering; Bayesian networks; Effort estimation; ISBSG; Naive Bayes Classifier; discretization methods; software projects;
Conference_Titel :
Empirical Software Engineering and Measurement (ESEM), 2012 ACM-IEEE International Symposium on
Conference_Location :
Lund
Print_ISBN :
978-1-4503-1056-7
Electronic_ISBN :
1938-6451
DOI :
10.1145/2372251.2372268