DocumentCode :
894760
Title :
A Comprehensive Empirical Study of Count Models for Software Fault Prediction
Author :
Gao, Kehan ; Khoshgoftaar, Taghi M.
Author_Institution :
Dept. of Math. & Comput. Sci., Eastern Connecticut State Univ., Willimantic, CT
Volume :
56
Issue :
2
fYear :
2007
fDate :
6/1/2007 12:00:00 AM
Firstpage :
223
Lastpage :
236
Abstract :
Count models, such as the Poisson regression model, and the negative binomial regression model, can be used to obtain software fault predictions. With the aid of such predictions, the development team can improve the quality of operational software. The zero-inflated, and hurdle count models may be more appropriate when, for a given software system, the number of modules with faults are very few. Related literature lacks quantitative guidance regarding the application of count models for software quality prediction. This study presents a comprehensive empirical investigation of eight count models in the context of software fault prediction. It includes comparative hypothesis testing, model selection, and performance evaluation for the count models with respect to different criteria. The case study presented is that of a full-scale industrial software system. It is observed that the information obtained from hypothesis testing, and model selection techniques was not consistent with the predictive performances of the count models. Moreover, the comparative analysis based on one criterion did not match that of another criterion. However, with respect to a given criterion, the performance of a count model is consistent for both the fit, and test data sets. This ensures that, if a fitted model is considered good based on a given criterion, then the model will yield a good prediction based on the same criterion. The relative performances of the eight models are evaluated based on a one-way anova model, and Tukey´s multiple comparison technique. The comparative study is useful in selecting the best count model for estimating the quality of a given software system
Keywords :
regression analysis; software fault tolerance; software metrics; software quality; statistical testing; stochastic processes; Pearson chi-square; Poisson regression model; Tukey multiple comparison technique; hypothesis testing; industrial software system; model selection technique; negative binomial regression model; one-way ANOVA model; software fault prediction count model; software metrics; software quality prediction; Analysis of variance; Error analysis; Frequency estimation; Integrated circuit modeling; Predictive models; Probability; Software quality; Software systems; Statistical analysis; Testing; anova; Pearson\´s chi-square; Tukey\´s multiple comparison; count models; hypothesis testing; information criteria; software metrics; software quality;
fLanguage :
English
Journal_Title :
Reliability, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9529
Type :
jour
DOI :
10.1109/TR.2007.896761
Filename :
4220784
Link To Document :
بازگشت