DocumentCode :
3442815
Title :
Mining GitHub: Why Commit Stops -- Exploring the Relationship between Developer´s Commit Pattern and File Version Evolution
Author :
Yang Weicheng ; Shen Beijun ; Xu Ben
Author_Institution :
Sch. of Software Eng., Shanghai Jiao Tong Univ., Shanghai, China
Volume :
2
fYear :
2013
fDate :
2-5 Dec. 2013
Firstpage :
165
Lastpage :
169
Abstract :
Using the freeware in GitHub, we are often confused by a phenomenon: the new version of GitHub freeware usually was released in an indefinite frequency, and developers often committed nothing for a long time. This evolution phenomenon interferes with our own development plan and architecture design. Why do these updates happen at that time? Can we predict GitHub software version evolution by developers\´ activities? This paper aims to explore the developer commit patterns in GitHub, and try to mine the relationship between these patterns (if exists) and code evolution. First, we define four metrics to measure commit activity and code evolution: the changes in each commit, the time between two commits, the author of each changes, and the source code dependency. Then, we adopt visualization techniques to explore developers\´ commit activity and code evolution. Visual techniques are used to describe the progress of the given project and the authors\´ contributions. To analyze the commit logs in GitHub software repository automatically, Commits Analysis Tool (CAT) is designed and implemented. Finally, eight open source projects in GitHub are analyzed using CAT, and we find that: 1) the file changes in the previous versions may affect the file depend on it in the next version, 2) the average days around "huge commit" is 3 times of that around normal commit. Using these two patterns and developer\´s commit model, we can predict when his next commit comes and which file may be changed in that commit. Such information is valuable for project planning of both GitHub projects and other projects which use GitHub freeware to develop software.
Keywords :
data mining; data visualisation; software maintenance; CAT; GitHub freeware; GitHub software repository mining; GitHub software version evolution; code evolution; commit pattern; commits analysis tool; file version evolution; open source project; pattern mining; visualization technique; Data mining; Data visualization; History; Java; Measurement; Software; Visualization; GitHub; commit pattern; repository mining; version evolution; visualization technology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering Conference (APSEC), 2013 20th Asia-Pacific
Conference_Location :
Bangkok
ISSN :
1530-1362
Print_ISBN :
978-1-4799-2143-0
Type :
conf
DOI :
10.1109/APSEC.2013.133
Filename :
6754372
Link To Document :
بازگشت