DocumentCode :
2401239
Title :
GHTorrent: Github´s data from a firehose
Author :
Gousios, Georgios ; Spinellis, Diomidis
Author_Institution :
Dept. of Manage. Sci. & Technol., Athens Univ. of Econ. & Bus., Athens, Greece
fYear :
2012
fDate :
2-3 June 2012
Firstpage :
12
Lastpage :
21
Abstract :
A common requirement of many empirical software engineering studies is the acquisition and curation of data from software repositories. During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve both the commits to the projects´ repositories and events generated through user actions on project resources. GHTorrent aims to create a scalable off line mirror of GitHub´s event streams and persistent data, and offer it to the research community as a service. In this paper, we present the project´s design and initial implementation and demonstrate how the provided datasets can be queried and processed.
Keywords :
application program interfaces; data acquisition; public domain software; software engineering; storage management; GHTorrent; Github data; REST API; collaboration platform; data acquisition; data curation; mirroring platform; open source software; project hosting platform; project resources; software engineering studies; software repositories; user actions; Communities; Distributed databases; Electronic mail; Organizations; Peer to peer computing; Protocols; GitHub; commits; dataset; events; repository;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on
Conference_Location :
Zurich
ISSN :
2160-1852
Print_ISBN :
978-1-4673-1760-3
Type :
conf
DOI :
10.1109/MSR.2012.6224294
Filename :
6224294
Link To Document :
بازگشت