• DocumentCode
    2379439
  • Title

    Named entity disambiguation on an ontology enriched by Wikipedia

  • Author

    Nguyen, Hien T. ; Cao, Tru H.

  • Author_Institution
    Ton Duc Thang Univ., Ho Chi Minh City
  • fYear
    2008
  • fDate
    13-17 July 2008
  • Firstpage
    247
  • Lastpage
    254
  • Abstract
    Currently, for named entity disambiguation, the short-age of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems.
  • Keywords
    classification; learning (artificial intelligence); ontologies (artificial intelligence); text analysis; Wikipedia; annotated corpus; machine learning; named entity disambiguation; ontology; Cities and towns; Data mining; Feature extraction; Knowledge based systems; Machine learning; Ontologies; Robustness; Social network services; Training data; Wikipedia; annotation; entity disambiguation; knowledge base; named entity; ontology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Research, Innovation and Vision for the Future, 2008. RIVF 2008. IEEE International Conference on
  • Conference_Location
    Ho Chi Minh City
  • Print_ISBN
    978-1-4244-2379-8
  • Electronic_ISBN
    978-1-4244-2380-4
  • Type

    conf

  • DOI
    10.1109/RIVF.2008.4586363
  • Filename
    4586363