Title :
Filtering noise in mixed-purpose fixing commits to improve defect prediction and localization
Author :
Hoan Anh Nguyen ; Anh Tuan Nguyen ; Nguyen, Tuan N.
Author_Institution :
Electr. & Comput. Eng. Dept., Iowa State Univ., Ames, IA, USA
Abstract :
In open-source software projects, during fixing software faults, developers sometimes also perform other types of non-fixing code changes such as functionality enhancement, code restructuring/improving, or documentation. They commit non-fixing changes together with the fixing ones in the same transaction. We call them mixed-purpose fixing commits (MFCs). We have conducted an empirical study on MFCs in several popular open-source projects. Our results showed that MFCs are about 11%-39% of total fixing commits. In 3%-41% of MFCs, developers performed other change types without indicating them in the commit logs. Our study also showed that mining software repositories (MSR) approaches that rely on the recovery of the history of fixed/buggy files are affected by the noisy data where non-fixing changes in MFCs are considered as fixing ones. The results of our study motivated us to develop Cardo, a tool to identify MFCs and filter non-fixing changed files in the change sets of the fixing commits. It uses natural language processing to analyze the sentences in commit logs and program analysis to cluster the changes in the change sets to determine if a changed file is for non-fixing. Our empirical evaluation on several open-source projects showed that Cardo achieves on average 93% precision, and existing MSR approaches can be relatively improved up to 32% with data filtered by Cardo.
Keywords :
program debugging; program diagnostics; public domain software; Cardo; MFCs; MSR approaches; buggy files; commit logs; defect localization; defect prediction; fixed files; mining software repositories approaches; mixed-purpose fixing commits; natural language processing; noise filtering; nonfixing changed files filtering; open-source software projects; program analysis; Data mining; Documentation; Feature extraction; History; Noise; Open source software;
Conference_Titel :
Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on
Conference_Location :
Pasadena, CA
DOI :
10.1109/ISSRE.2013.6698913