Title :
Named entity disambiguation on an ontology enriched by Wikipedia
Author :
Nguyen, Hien T. ; Cao, Tru H.
Author_Institution :
Ton Duc Thang Univ., Ho Chi Minh City
Abstract :
Currently, for named entity disambiguation, the short-age of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems.
Keywords :
classification; learning (artificial intelligence); ontologies (artificial intelligence); text analysis; Wikipedia; annotated corpus; machine learning; named entity disambiguation; ontology; Cities and towns; Data mining; Feature extraction; Knowledge based systems; Machine learning; Ontologies; Robustness; Social network services; Training data; Wikipedia; annotation; entity disambiguation; knowledge base; named entity; ontology;
Conference_Titel :
Research, Innovation and Vision for the Future, 2008. RIVF 2008. IEEE International Conference on
Conference_Location :
Ho Chi Minh City
Print_ISBN :
978-1-4244-2379-8
Electronic_ISBN :
978-1-4244-2380-4
DOI :
10.1109/RIVF.2008.4586363