Title :
Improving Cross-Project Defect Prediction Methods with Data Simplification
Author :
Sousuke Amasaki;Kazuya Kawata;Tomoyuki Yokogawa
Author_Institution :
Dept. of Syst. Eng., Okayama Prefectural Univ., Soja, Japan
Abstract :
Context: Cross-project defect prediction (CPDP) research has been popular and many CPDP methods were proposed. While these methods used cross-project data as is for their inputs, useless or noisy information in the cross-project data can cause the degradation of predictive and computation performance. Removing such information makes the cross-project data simple and it will affect the performance of CPDP methods. Objective: To identify and quantify the effects of the data simplification for CPDP methods. Method: We conducted experiments that compared the predictive performance between CPDP with and without the data simplification. We adopted a data simplification method based on an active learning method proposed for software effort estimation. The experiments adopted 44 versions of OSS projects, four prediction models, and two CPDP methods, namely, Burak-filter and cross-project selection. Results: The data simplification achieved significant improvement in predictive performance for the cross-project selection. It did not improve Burak-filter. Conclusion: The data simplification can be helpful for the cross-project selection in terms of predictive performance and size reduction of cross-project data.
Keywords :
"Predictive models","Measurement","Logistics","Learning systems","Support vector machines","Data models","Software"
Conference_Titel :
Software Engineering and Advanced Applications (SEAA), 2015 41st Euromicro Conference on
Electronic_ISBN :
2376-9505
DOI :
10.1109/SEAA.2015.25