DocumentCode :
180682
Title :
Ge(o)Lo(cator): Geographic Information Extraction from Unstructured Text Data and Web Documents
Author :
Nesi, Paolo ; Pantaleo, Gianni ; Tenti, Marco
Author_Institution :
Dept. of Inf. Eng., Univ. of Florence, Florence, Italy
fYear :
2014
fDate :
6-7 Nov. 2014
Firstpage :
60
Lastpage :
65
Abstract :
The constantly growing number of websites, web pages, documents and, textual (Big) Data populating the Internet currently represents a massive resource of information and knowledge for various interests and across many different domains. However, the big amount and the complexity of unstructured, natural language textual data implies several issues and difficulties for end users to find a specific, desired pieces of information. In the era of maximum uptake of social networks and media, automatic extraction and retrieval of geographic information is becoming a field of large interest. In this paper, the GeLo system for extracting addresses and geographical coordinates of companies and organizations from their web domains is presented. The information extraction process relies on NLP techniques, specifically Part-Of-Speech-tagging, pattern recognition and annotation. The overall system performances have been manually evaluated against a consistent subset of the extracted URLs database.
Keywords :
Big Data; Internet; Web sites; geographic information systems; information retrieval; natural language processing; text analysis; GeLo system; Internet; NLP techniques; URL database; Web domains; Web pages; Web sites; address extraction; automatic geographic information extraction; automatic geographic information retrieval; geographical coordinates; geolocator; part-of-speech-tagging; pattern annotation; pattern recognition; social media; social networks; textual big data; unstructured natural language textual data; Cities and towns; Companies; Data mining; Databases; Semantics; Web pages; Geocoding; Geographic Information Retrieval; Geoparsing; Web crawling; data mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic and Social Media Adaptation and Personalization (SMAP), 2014 9th International Workshop on
Conference_Location :
Corfu
Print_ISBN :
978-1-4799-6813-8
Type :
conf
DOI :
10.1109/SMAP.2014.27
Filename :
6978954
Link To Document :
بازگشت