• DocumentCode
    3467186
  • Title

    Nested Named Entity Recognition in Historical Archive Text

  • Author

    Byrne, Kate

  • Author_Institution
    Univ. of Edinburgh, Edinburgh
  • fYear
    2007
  • fDate
    17-19 Sept. 2007
  • Firstpage
    589
  • Lastpage
    596
  • Abstract
    This paper describes work on Named Entity Recognition (NER), in preparation for Relation Extraction (RE), on data from a historical archive organisation. As is often the case in the cultural heritage domain, the source text includes a high percentage of specialist terminology, and is of very variable quality in terms of grammaticality and completeness. The NER and RE tasks were carried out using a specially annotated corpus, and are themselves preliminary steps in a larger project whose aim is to transform discovered relations into a graph structure that can be queried using standard tools. Experimental results from the NER task are described, with emphasis on dealing with nested entities using a multi-word token method. The overall objective is to improve access by non-specialist users to a valuable cultural resource.
  • Keywords
    information retrieval systems; relational databases; historical archive organisation; historical archive text; multi-word token method; nested named entity recognition; relation extraction; source text; Cultural differences; Data mining; Database languages; Government; Informatics; Internet; Resource description framework; Scattering; Terminology; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing, 2007. ICSC 2007. International Conference on
  • Conference_Location
    Irvine, CA
  • Print_ISBN
    978-0-7695-2997-4
  • Type

    conf

  • DOI
    10.1109/ICSC.2007.107
  • Filename
    4338398