Title :
Better cross company defect prediction
Author :
Peters, F. ; Menzies, T. ; Marcus, Andrian
Author_Institution :
Lane Dept. of CS & EE, West Virginia Univ., Morgantown, WV, USA
Abstract :
How can we find data for quality prediction? Early in the life cycle, projects may lack the data needed to build such predictors. Prior work assumed that relevant training data was found nearest to the local project. But is this the best approach? This paper introduces the Peters filter which is based on the following conjecture: When local data is scarce, more information exists in other projects. Accordingly, this filter selects training data via the structure of other projects. To assess the performance of the Peters filter, we compare it with two other approaches for quality prediction. Within-company learning and cross-company learning with the Burak filter (the state-of-the-art relevancy filter). This paper finds that: 1) within-company predictors are weak for small data-sets; 2) the Peters filter+cross-company builds better predictors than both within-company and the Burak filter+cross-company; and 3) the Peters filter builds 64% more useful predictors than both within-company and the Burak filter+cross-company approaches. Hence, we recommend the Peters filter for cross-company learning.
Keywords :
data mining; learning (artificial intelligence); software quality; Burak filter-cross-company approach; Peters filter-cross-company approach; cross-company defect prediction; cross-company learning; local data; quality prediction; state-of-the-art relevancy filter; training data; within-company learning; within-company predictors; Companies; Estimation; Predictive models; Radio frequency; Software; Training data; Vegetation; Cross company; data mining; defect prediction;
Conference_Titel :
Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4799-0345-0
DOI :
10.1109/MSR.2013.6624057