Named entity disambiguation on an ontology enriched by Wikipedia

Author

Nguyen, Hien T. ; Cao, Tru H.

Author_Institution

Ton Duc Thang Univ., Ho Chi Minh City

fYear

2008

fDate

13-17 July 2008

Firstpage

247

Lastpage

254

Abstract

Currently, for named entity disambiguation, the short-age of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems.

Keywords

classification; learning (artificial intelligence); ontologies (artificial intelligence); text analysis; Wikipedia; annotated corpus; machine learning; named entity disambiguation; ontology; Cities and towns; Data mining; Feature extraction; Knowledge based systems; Machine learning; Ontologies; Robustness; Social network services; Training data; Wikipedia; annotation; entity disambiguation; knowledge base; named entity; ontology;

fLanguage

English

Publisher

ieee

Conference_Titel

Research, Innovation and Vision for the Future, 2008. RIVF 2008. IEEE International Conference on

Conference_Location

Ho Chi Minh City

Print_ISBN

978-1-4244-2379-8

Electronic_ISBN

978-1-4244-2380-4

Type

conf

DOI

10.1109/RIVF.2008.4586363

Filename

4586363