DocumentCode
2379439
Title
Named entity disambiguation on an ontology enriched by Wikipedia
Author
Nguyen, Hien T. ; Cao, Tru H.
Author_Institution
Ton Duc Thang Univ., Ho Chi Minh City
fYear
2008
fDate
13-17 July 2008
Firstpage
247
Lastpage
254
Abstract
Currently, for named entity disambiguation, the short-age of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems.
Keywords
classification; learning (artificial intelligence); ontologies (artificial intelligence); text analysis; Wikipedia; annotated corpus; machine learning; named entity disambiguation; ontology; Cities and towns; Data mining; Feature extraction; Knowledge based systems; Machine learning; Ontologies; Robustness; Social network services; Training data; Wikipedia; annotation; entity disambiguation; knowledge base; named entity; ontology;
fLanguage
English
Publisher
ieee
Conference_Titel
Research, Innovation and Vision for the Future, 2008. RIVF 2008. IEEE International Conference on
Conference_Location
Ho Chi Minh City
Print_ISBN
978-1-4244-2379-8
Electronic_ISBN
978-1-4244-2380-4
Type
conf
DOI
10.1109/RIVF.2008.4586363
Filename
4586363
Link To Document