DocumentCode :
237359
Title :
A Semi-supervised Approach to Software Defect Prediction
Author :
Huihua Lu ; Cukic, Bojan ; Culp, Mark
Author_Institution :
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
fYear :
2014
fDate :
21-25 July 2014
Firstpage :
416
Lastpage :
425
Abstract :
Accurate detection of software components that need to be exposed to additional verification and validation offers the path to high quality products while minimizing non essential software assurance expenditures. In this type of quality modeling we assume that software modules with known fault content developed in similar environment are available. Supervised learning algorithms are the traditional methods of choice for training on existing modules. The models are then used to predict fault content for newly developed software components prior to product release. However, one needs to realize that establishing whether a module contains a fault or not, only to be used for model training, can be expensive. The basic idea behind semi-supervised learning is to learn from a small number of software modules with known fault content and supplement model training with modules for which the fault information is not available, thus reducing the overall cost of quality assurance. In this study, we investigate the performance of semi-supervised learning for software fault prediction. A preprocessing strategy, multidimensional scaling, is embedded in the approach to reduce the dimensional complexity of software metrics used for prediction. Our results show that the dimension-reduction with semi-supervised learning algorithm preforms significantly better than one of the best performing supervised learning algorithm - random forest - in situations when few modules with known fault content are available. We compare our results with the published benchmarks and clearly demonstrate performance benefits.
Keywords :
learning (artificial intelligence); program verification; software metrics; software quality; dimensional complexity reduction; fault content prediction; multidimensional scaling; preprocessing strategy; quality assurance; quality modeling; random forest; semisupervised approach; software component detection; software defect prediction; software metrics; software modules; supervised learning algorithm; validation; verification; Measurement; Prediction algorithms; Predictive models; Semisupervised learning; Software; Software algorithms; Training; dimension reduction; semi-supervised learning; software fault prediction; software metrics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Software and Applications Conference (COMPSAC), 2014 IEEE 38th Annual
Conference_Location :
Vasteras
Type :
conf
DOI :
10.1109/COMPSAC.2014.65
Filename :
6899244
Link To Document :
بازگشت