Title :
Customising geoparsing and georeferencing for historical texts
Author :
Rupp, C.J. ; Rayson, Paul ; Baron, A. ; Donaldson, Christopher ; Gregory, Ian ; Hardie, Andrew ; Murrieta-Flores, Patricia
Author_Institution :
Lancaster Univ., Lancaster, UK
Abstract :
In order to better support the text mining of historical texts, we propose a combination of complementary techniques from Geographical Information Systems, computational and corpus linguistics. In previous work, we have described this as `visual gisting´ to extract important themes from text and locate those themes on a map representing geographical information contained in the text. Here, we describe the steps that were found necessary to apply standard analysis and resolution tools to identify place names in a specific corpus of historical texts. This task is seen as an initial and prerequisite step for further analysis and comparison by combining the information we extract from a corpus with information from other sources, including other text corpora. The process is intended to support close reading of historical texts on a much larger scale by highlighting using exploratory and data-driven approaches which parts of the corpus warrant further close analysis. Our case study presented here is from a corpus of Lake District travel literature. We discuss the customisations that we have to make to existing tools to extract placename information and visualise it on a map.
Keywords :
computational linguistics; data mining; geographic information systems; history; text analysis; Geographical Information Systems; Lake District travel literature; computational linguistics; corpus linguistics; geoparsing; georeferencing; historical texts; information extraction; place name identification; text corpora; text mining; Data mining; Geographic information systems; Gold; Lakes; Optical character recognition software; Pragmatics; Standards; Historical Corpora; Text mining; Toponym Resolution;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691671