DocumentCode :
1791784
Title :
Dealing with heterogeneous big data when geoparsing historical corpora
Author :
Rupp, C.J. ; Rayson, Paul ; Gregory, Ian ; Hardie, Andrew ; Joulain, Amelia ; Hartmann, Daniel
Author_Institution :
Lancaster Univ., Lancaster, UK
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
80
Lastpage :
83
Abstract :
It has long been known that `variety´ is one of the key challenges and opportunities of big data. This is especially true when we consider the variety of content in historical corpora resulting from large-scale digitisation activities. Collections such as Early English Books Online (EEBO) and the British Library 19th Century Newspapers are extremely large and heterogeneous data sources containing a variety of content in terms of time, location, topic, style and quality. The range of geographical locations referenced in these corpora poses a difficult challenge for state of the art geoparsing tools. In the context of our work on Spatial Humanities analyses, we present our solution for dealing with the variety and scale of these corpora.
Keywords :
Big Data; art; libraries; publishing; British Library 19th Century Newspapers; Early English Books Online; art geoparsing tools; geoparsing historical corpora; heterogeneous big data; heterogeneous data sources; large-scale digitisation activities; spatial humanities analyses; Big data; Context; Diseases; Geographic information systems; Geography; Libraries; Pipelines; Historical Corpora; NLP Pipelines and Workflows; Text mining; Toponym Resolution;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004457
Filename :
7004457
Link To Document :
بازگشت