DocumentCode
3407306
Title
An unabridged source code dataset for research in software reuse
Author
Janjic, Werner ; Hummel, Oliver ; Schumacher, Markus ; Atkinson, Colin
Author_Institution
Software-Eng. Group, Univ. of Mannheim, Mannheim, Germany
fYear
2013
fDate
18-19 May 2013
Firstpage
339
Lastpage
342
Abstract
This paper describes a large, unabridged data-set of Java source code gathered and shared as part of the Merobase Component Finder project of the Software-Engineering Group at the University of Mannheim. It consists of the complete index used to drive the search engine, www.merobase.com, the vast majority1 of the source code modules accessible through it, and a tool that enables researchers to efficiently browse the collected data. We describe the techniques used to collect, format and store the data set, as well as the core capabilities of the Merobase search engine such as classic keyword-based, interface-based and test-driven search. This data-set, which represents one of the largest searchable collections of source and binary modules available online, has been recently made available for download and use in further research projects. All files are available at http://merobase.informatik.uni-mannheim.de/sources/.
Keywords
Java; search engines; software reusability; source coding; Java source code; Merobase component finder project; Merobase search engine; University of Mannheim; binary modules; interface-based search; keyword-based search; software engineering group; software reuse; source code modules; test-driven search; unabridged source code dataset; Containers; Indexes; Java; Open source software; Relational databases; Search engines;
fLanguage
English
Publisher
ieee
Conference_Titel
Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on
Conference_Location
San Francisco, CA
ISSN
2160-1852
Print_ISBN
978-1-4799-0345-0
Type
conf
DOI
10.1109/MSR.2013.6624047
Filename
6624047
Link To Document