DocumentCode :
2458445
Title :
A Dataset Search Engine for the Research Document Corpus
Author :
Lu, Meiyu ; Bangalore, Srinivas ; Cormode, Graham ; Hadjieleftheriou, Marios ; Srivastava, Divesh
Author_Institution :
Nat. Univ. of Singapore, Singapore, Singapore
fYear :
2012
fDate :
1-5 April 2012
Firstpage :
1237
Lastpage :
1240
Abstract :
A key step in validating a proposed idea or system is to evaluate over a suitable dataset. However, to this date there have been no useful tools for researchers to understand which datasets have been used for what purpose, or in what prior work. Instead, they have to manually browse through papers to find the suitable datasets and their corresponding URLs, which is laborious and inefficient. To better aid the dataset discovery process, and provide a better understanding of how and where datasets have been used, we propose a framework to effectively identify datasets within the scientific corpus. The key technical challenges are identification of datasets, and discovery of the association between a dataset and the URLs where they can be accessed. Based on this, we have built a user friendly web-based search interface for users to conveniently explore the dataset-paper relationships, and find relevant datasets and their properties.
Keywords :
data analysis; document handling; search engines; user interfaces; URL; dataset discovery process; dataset search engine; dataset-paper relationships; research document corpus; scientific corpus; user friendly Web-based search interface; Benchmark testing; Bibliographies; Data mining; Feature extraction; Libraries; Portable document format; Search engines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
ISSN :
1063-6382
Print_ISBN :
978-1-4673-0042-1
Type :
conf
DOI :
10.1109/ICDE.2012.80
Filename :
6228177
Link To Document :
بازگشت