Title :
The GHTorent dataset and tool suite
Author :
Gousios, Georgios
Author_Institution :
Software Eng. Res. Group, Delft Univ. of Technol., Delft, Netherlands
Abstract :
During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this paper, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.
Keywords :
application program interfaces; groupware; information resources; information retrieval; software engineering; GHTorent dataset; GitHub; collaboration platform; extensive REST API; high-quality interconnected data retrieval; hosting platform; mirroring platform; tool suite; Collaboration; Data collection; Data mining; Databases; History; Organizations; Software engineering; GitHub; dataset; repository;
Conference_Titel :
Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4799-0345-0
DOI :
10.1109/MSR.2013.6624034