DocumentCode :
2795942
Title :
Variance Analysis in Software Fault Prediction Models
Author :
Jiang, Yue ; Lin, Jie ; Cukic, Bojan ; Menzies, Tim
Author_Institution :
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
fYear :
2009
fDate :
16-19 Nov. 2009
Firstpage :
99
Lastpage :
108
Abstract :
Software fault prediction models play an important role in software quality assurance. They identify software subsystems (modules,components, classes, or files) which are likely to contain faults. These subsystems, in turn, receive additional resources for verification and validation activities. Fault prediction models are binary classifiers typically developed using one of the supervised learning techniques from either a subset of the fault data from the current project or from a similar past project. In practice, it is critical that such models provide a reliable prediction performance on the data not used in training. Variance is an important reliability indicator of software fault prediction models. However, variance is often ignored or barely mentioned in many published studies. In this paper, through the analysis of twelve data sets from a public software engineering repository from the perspective of variance, we explore the following five questions regarding fault prediction models: (1) Do different types ofclassification performance measures exhibit different variance? (2) Does the size of the data set imply a more (or less) accurate prediction performance? (3) Does the size of training subset impact model´s stability? (4) Do different classifiers consistently exhibit different performance in terms of model´s variance? (5) Are there differences between variance from 1000 runs and 10 runs of 10-fold cross validation experiments? Our results indicate that variance is a very important factor in understanding fault prediction models and we recommend the best practice for reporting variance in empirical software engineering studies.
Keywords :
learning (artificial intelligence); software fault tolerance; software quality; statistical analysis; binary classifier; reliability indicator; software fault prediction model; software quality assurance; supervised learning; variance analysis; Analysis of variance; Data analysis; Fault diagnosis; Performance analysis; Predictive models; Software engineering; Software measurement; Software quality; Stability; Supervised learning; fault prediction models; machine learning; variance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Reliability Engineering, 2009. ISSRE '09. 20th International Symposium on
Conference_Location :
Mysuru, Karnataka
ISSN :
1071-9458
Print_ISBN :
978-1-4244-5375-7
Electronic_ISBN :
1071-9458
Type :
conf
DOI :
10.1109/ISSRE.2009.13
Filename :
5362090
Link To Document :
بازگشت