• DocumentCode
    3407045
  • Title

    The GHTorent dataset and tool suite

  • Author

    Gousios, Georgios

  • Author_Institution
    Software Eng. Res. Group, Delft Univ. of Technol., Delft, Netherlands
  • fYear
    2013
  • fDate
    18-19 May 2013
  • Firstpage
    233
  • Lastpage
    236
  • Abstract
    During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this paper, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.
  • Keywords
    application program interfaces; groupware; information resources; information retrieval; software engineering; GHTorent dataset; GitHub; collaboration platform; extensive REST API; high-quality interconnected data retrieval; hosting platform; mirroring platform; tool suite; Collaboration; Data collection; Data mining; Databases; History; Organizations; Software engineering; GitHub; dataset; repository;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    2160-1852
  • Print_ISBN
    978-1-4799-0345-0
  • Type

    conf

  • DOI
    10.1109/MSR.2013.6624034
  • Filename
    6624034