• DocumentCode
    1825576
  • Title

    The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection

  • Author

    Crane, Gregory ; Jones, Alison

  • Author_Institution
    Perseus Project, Tufts Univ., Medford, MA
  • fYear
    2006
  • fDate
    38869
  • Firstpage
    31
  • Lastpage
    40
  • Abstract
    This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond. This paper analyzes success with ten categories of entities prominent in these newspapers and the particular problems that these classes of named entities raise. Personal and place names are familiar but some more important categories (such as ship names and military units) illustrate some of the challenges that named entity identification confronts as it evolves into a fundamental tool not only for automatic metadata generation but also for searching and browsing as well. We conclude by suggesting the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service
  • Keywords
    digital libraries; history; information analysis; information retrieval; meta data; 19th-century newspaper collection; Civil War years; IMLS; Richmond Times Dispatch; Virginia Banks; automatic extraction; automatic metadata generation; digital library; machine readable reference collections; named entity analysis; Abstracts; Assembly; Cranes; Encyclopedias; Information retrieval; Job listing service; Marine vehicles; Oceans; Permission; Software libraries; digital libraries; historical newspapers; named entity recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
  • Conference_Location
    Chapel Hill, NC
  • Print_ISBN
    1-59593-354-9
  • Type

    conf

  • DOI
    10.1145/1141753.1141759
  • Filename
    4119094