DocumentCode
1825576
Title
The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection
Author
Crane, Gregory ; Jones, Alison
Author_Institution
Perseus Project, Tufts Univ., Medford, MA
fYear
2006
fDate
38869
Firstpage
31
Lastpage
40
Abstract
This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond. This paper analyzes success with ten categories of entities prominent in these newspapers and the particular problems that these classes of named entities raise. Personal and place names are familiar but some more important categories (such as ship names and military units) illustrate some of the challenges that named entity identification confronts as it evolves into a fundamental tool not only for automatic metadata generation but also for searching and browsing as well. We conclude by suggesting the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service
Keywords
digital libraries; history; information analysis; information retrieval; meta data; 19th-century newspaper collection; Civil War years; IMLS; Richmond Times Dispatch; Virginia Banks; automatic extraction; automatic metadata generation; digital library; machine readable reference collections; named entity analysis; Abstracts; Assembly; Cranes; Encyclopedias; Information retrieval; Job listing service; Marine vehicles; Oceans; Permission; Software libraries; digital libraries; historical newspapers; named entity recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
Conference_Location
Chapel Hill, NC
Print_ISBN
1-59593-354-9
Type
conf
DOI
10.1145/1141753.1141759
Filename
4119094
Link To Document