Title :
GHTorrent: Github´s data from a firehose
Author :
Gousios, Georgios ; Spinellis, Diomidis
Author_Institution :
Dept. of Manage. Sci. & Technol., Athens Univ. of Econ. & Bus., Athens, Greece
Abstract :
A common requirement of many empirical software engineering studies is the acquisition and curation of data from software repositories. During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve both the commits to the projects´ repositories and events generated through user actions on project resources. GHTorrent aims to create a scalable off line mirror of GitHub´s event streams and persistent data, and offer it to the research community as a service. In this paper, we present the project´s design and initial implementation and demonstrate how the provided datasets can be queried and processed.
Keywords :
application program interfaces; data acquisition; public domain software; software engineering; storage management; GHTorrent; Github data; REST API; collaboration platform; data acquisition; data curation; mirroring platform; open source software; project hosting platform; project resources; software engineering studies; software repositories; user actions; Communities; Distributed databases; Electronic mail; Organizations; Peer to peer computing; Protocols; GitHub; commits; dataset; events; repository;
Conference_Titel :
Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on
Conference_Location :
Zurich
Print_ISBN :
978-1-4673-1760-3
DOI :
10.1109/MSR.2012.6224294