DocumentCode :
1018786
Title :
Classifying Software Changes: Clean or Buggy?
Author :
Kim, Sunghun ; Whitehead, E. James, Jr. ; Zhang, Yi
Author_Institution :
Massachusetts Inst. of Technol., Cambridge
Volume :
34
Issue :
2
fYear :
2008
Firstpage :
181
Lastpage :
196
Abstract :
This paper introduces a new technique for predicting latent software bugs, called change classification. Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes or clean changes. In this manner, change classification predicts the existence of bugs in software changes. The classifier is trained using features (in the machine learning sense) extracted from the revision history of a software project stored in its software configuration management repository. The trained classifier can classify changes as buggy or clean, with a 78 percent accuracy and a 60 percent buggy change recall on average. Change classification has several desirable qualities: 1) The prediction granularity is small (a change to a single file), 2) predictions do not require semantic information about the source code, 3) the technique works for a broad array of project types and programming languages, and 4) predictions can be made immediately upon the completion of a change. Contributions of this paper include a description of the change classification approach, techniques for extracting features from the source code and change histories, a characterization of the performance of change classification across 12 open source projects, and an evaluation of the predictive power of different groups of features.
Keywords :
data mining; feature extraction; learning (artificial intelligence); program debugging; software maintenance; software metrics; association rule; change classification; feature extraction; machine learning classifier; open source projects; programming languages; software change; software configuration management repository; software maintenance; software metrics; software project; Clustering; Configuration Management; Data mining; Metrics/Measurement; Software maintenance; and association rules; classification;
fLanguage :
English
Journal_Title :
Software Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
0098-5589
Type :
jour
DOI :
10.1109/TSE.2007.70773
Filename :
4408585
Link To Document :
بازگشت