Title :
Reading handwritten US census forms
Author :
Madhvanath, S. ; Govindaraju, V. ; Ramanaprasad, V. ; Lee, D.S. ; Srihari, S.N.
Author_Institution :
Center of Excellence for Document Anal. & Recognition, State Univ. of New York, Buffalo, NY, USA
Abstract :
Commercial forms-reading systems for extraction of data from forms do not meet acceptable accuracy requirements on forms filled out by hand. In December 1993, NIST called industry and research organizations working in the area of handwriting recognition to participate in a test to determine the state of the art in the area. A database of form images containing actual responses received by the US Census Bureau was provided. The handwritten responses are very loosely constrained in terms of writing style, format of response and choice of text. The sizes of the lexicons provided are very large (about 50000 entries) and yet the coverage is incomplete (about 70%). In this paper we discuss the approach taken by CEDAR to automate the task of reading the census forms. The subtasks of field extraction and phrase recognition are described
Keywords :
document image processing; feature extraction; handwriting recognition; CEDAR; data extraction; database; field extraction; form images; handwritten US census forms reading; lexicons; phrase recognition; Automatic testing; Data mining; Handwriting recognition; Image databases; Image recognition; NIST; System testing; Text analysis; Text recognition; Writing;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.598949