Title :
Holistic word recognition for handwritten historical documents
Author :
Lavrenko, Victor ; Rath, Toni M. ; Manmatha, R.
Author_Institution :
Center for Intelligent Inf. Retrieval, Massachusetts Univ., Amherst, MA, USA
Abstract :
Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The recognition result of a word is then the composition of the individually recognized parts. Inspired by results in cognitive psychology, researchers have begun to focus on holistic word recognition approaches. Here we present a holistic word recognition approach for single-author historical documents, which is motivated by the fact that for severely degraded documents a segmentation of words into characters will produce very poor results. The quality of the original documents does not allow us to recognize them with high accuracy - our goal here is to produce transcriptions that will allow successful retrieval of images, which has been shown to be feasible even in such noisy environments. We believe that this is the first systematic approach to recognizing words in historical manuscripts with extensive experiments. Our experiments show recognition accuracy of 65%, which exceeds performance of other systems which operate on non-degraded input images (nonhistorical documents).
Keywords :
document image processing; handwritten character recognition; history; image retrieval; word processing; cognitive psychology; handwritten single-author historical documents; holistic word recognition; image retrieval; Character recognition; Degradation; Handwriting recognition; Humans; Image recognition; Image retrieval; Information retrieval; Libraries; Psychology; Working environment noise;
Conference_Titel :
Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on
Print_ISBN :
0-7695-2088-X
DOI :
10.1109/DIAL.2004.1263256