مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning from Open-Source Projects: An Empirical Study on Defect Prediction

DocumentCode :

652630

Title :

Learning from Open-Source Projects: An Empirical Study on Defect Prediction

Author :

Zhimin He ; Peters, F. ; Menzies, T. ; Ye Yang

Author_Institution :

Lab. for Internet Software Technol., Inst. of Software, Beijing, China

fYear :

2013

fDate :

10-11 Oct. 2013

Firstpage :

Lastpage :

Abstract :

The fundamental issue in cross project defect prediction is selecting the most appropriate training data for creating quality defect predictors. Another concern is whether historical data of open-source projects can be used to create quality predictors for proprietary projects from a practical point-of-view. Current studies have proposed statistical approaches to finding these training data, however, thus far no apparent effort has been made to study their success on proprietary data. Also these methods apply brute force techniques which are computationally expensive. In this work we introduce a novel data selection procedure which takes into account the similarities between the distribution of the test and potential training data. Additionally we use feature subset selection to increase the similarity between the test and training sets. Our procedure provides a comparable and scalable means of solving the cross project defect prediction problem for creating quality defect predictors. To evaluate our procedure we conducted empirical studies with comparisons to the within company defect prediction and a relevancy filtering method. We found that our proposed method performs relatively better than the filtering method in terms of both computation cost and prediction performance.

Keywords :

learning (artificial intelligence); program debugging; project management; public domain software; statistical analysis; brute force techniques; company defect prediction; computation cost; cross project defect prediction problem; data selection procedure; feature subset selection; open-source project learning; prediction performance; proprietary data; proprietary projects; quality defect predictor creation; relevancy filtering method; statistical approach; test distribution; test-training set similarity; training data; Data models; Filtering; Open source software; Predictive models; Training; Training data; cross-project; data similarity; feature subset selection; instance selection; software defect prediction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Empirical Software Engineering and Measurement, 2013 ACM / IEEE International Symposium on

Conference_Location :

Baltimore, MD

ISSN :

1938-6451

Print_ISBN :

978-0-7695-5056-5

Type :

conf

DOI :

10.1109/ESEM.2013.20

Filename :

6681337

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=652630