DocumentCode :
650705
Title :
Mining Software Repositories for Accurate Authorship
Author :
Xiaozhu Meng ; Miller, Barton P. ; Williams, William R. ; Bernat, Andrew R.
Author_Institution :
Comput. Sci. Dept., Univ. of Wisconsin, Madison, WI, USA
fYear :
2013
fDate :
22-28 Sept. 2013
Firstpage :
250
Lastpage :
259
Abstract :
Code authorship information is important for analyzing software quality, performing software forensics, and improving software maintenance. However, current tools assume that the last developer to change a line of code is its author regardless of all earlier changes. This approximation loses important information. We present two new line-level authorship models to overcome this limitation. We first define the repository graph as a graph abstraction for a code repository, in which nodes are the commits and edges represent the development dependencies. Then for each line of code, structural authorship is defined as a sub graph of the repository graph recording all commits that changed the line and the development dependencies between the commits, weighted authorship is defined as a vector of author contribution weights derived from the structural authorship of the line and based on a code change measure between commits, for example, best edit distance. We have implemented our two authorship models as a new git built-in tool git-author. We evaluated git-author in an empirical study and a comparison study. In the empirical study, we ran git-author on five open source projects and found that git-author can recover more information than a current tool (git-blame) for about 10% of lines. In the comparison study, we used git-author to build a line-level model for bug prediction. We compared our line-level model with an existing file-level model. The results show that our line-level model performs consistently better than the file-level model when evaluated on our data sets produced from the Apache HTTP server project.
Keywords :
data mining; digital forensics; graph theory; program debugging; software maintenance; software quality; Apache HTTP server project; bug prediction; code authorship information; code repository; graph abstraction; repository graph; software forensics; software maintenance; software quality; software repositories mining; Data models; History; Predictive models; Radio access networks; Software quality; Vectors; Author contribution; Line-level bug prediction; Software quality; Version control system;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Maintenance (ICSM), 2013 29th IEEE International Conference on
Conference_Location :
Eindhoven
ISSN :
1063-6773
Type :
conf
DOI :
10.1109/ICSM.2013.36
Filename :
6676896
Link To Document :
بازگشت