• DocumentCode
    600262
  • Title

    Discretization methods for NBC in effort estimation: An empirical comparison based on ISBSG projects

  • Author

    Fernandez-Diego, M. ; Torralba-Martinez, J.-M.

  • Author_Institution
    Dept. of Bus. Adm., Univ. Politec. de Valencia, Valencia, Spain
  • fYear
    2012
  • fDate
    20-21 Sept. 2012
  • Firstpage
    103
  • Lastpage
    106
  • Abstract
    Background: Bayesian networks have been applied in many fields, including effort estimation in software engineering. Even though there are Bayesian inference algorithms than can handle continuous variables, performance tends to be better when these variables are discretized that when they are assumed to follow a specific distribution. On the other hand, the choice of the discretization method and the number of discretized intervals may lead to significantly different estimating results. However, discretization issues are seldom mentioned in software engineering effort estimation models. Aim: This paper seeks to show that discretization issues are important in terms of prediction accuracy while building a Naive Bayes Classifier (NBC) for estimating software effort. Method: For this purpose, a NBC model has been developed for software effort estimation based on ISBSG projects applying different discretization schemes (equal width intervals, equal frequency intervals, and k-means clustering) and using different number of intervals. Results: Regarding the NBC model built, the estimation accuracy of equal frequency discretization is only improved by k-means clustering with respect to Pred(0.25), although it reflects better the original distribution. Conclusions: Further experimentation should determine the potential of clustering methods already highlighted in other fields.
  • Keywords
    belief networks; pattern classification; pattern clustering; project management; software cost estimation; Bayesian inference algorithm; Bayesian network; ISBSG project; NBC model; cost estimation; discretization method; discretized interval; equal frequency discretization; equal frequency interval; equal width interval; k-means clustering; naive Bayes classifier; prediction accuracy; software effort estimation; software engineering; software project; Accuracy; Bayes methods; Computational modeling; Estimation; Productivity; Software; Software engineering; Bayesian networks; Effort estimation; ISBSG; Naive Bayes Classifier; discretization methods; software projects;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Empirical Software Engineering and Measurement (ESEM), 2012 ACM-IEEE International Symposium on
  • Conference_Location
    Lund
  • ISSN
    1938-6451
  • Print_ISBN
    978-1-4503-1056-7
  • Electronic_ISBN
    1938-6451
  • Type

    conf

  • DOI
    10.1145/2372251.2372268
  • Filename
    6475402