Title :
A system to read names and addresses on tax forms
Author :
Srihari, Sargur N. ; Shin, Yong-Chul ; Ramanaprasad, Vemulapati ; Lee, Dar-Shyang
Author_Institution :
Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA
fDate :
7/1/1996 12:00:00 AM
Abstract :
The reading of names and addresses is one of the most complex tasks in automated forms processing. This paper describes an integrated real-time system to read names and addresses on tax forms of the U.S. Internal Revenue Service. The Name and Address Block Reader (NABR) system accepts both machine-printed and hand-printed address block images as input. The application software has two major steps: document analysis (connected component analysis, address block extraction, label detection, hand-print/machine-print discrimination) and document recognition. Document recognition has two nonidentical streams for machine-print and hand-print: the key steps are address parsing, character recognition, word recognition, and postal database lookup. (ZIP+4 and City-State-ZIP files.) System output is a packet containing the results of recognition together with database access status file. Real-time throughput (8500 forms/h) is achieved by employing a loosely coupled multiprocessing architecture where successive input images are distributed to available address recognition processors. The functional architecture, software design, system architecture, and the hardware implementation are described. Performance evaluation on machine-printed and handwritten addresses are presented
Keywords :
business forms; character recognition equipment; financial data processing; government data processing; optical character recognition; City-State-ZIP file; Name and Address Block Reader system; US Internal Revenue Service; ZIP+4 file; address block extraction; address parsing; application software; automated processing; character recognition; connected component analysis; database access status file; document analysis; document recognition; hand-printed images; integrated real-time system; label detection; machine-printed images; multiprocessing architecture; postal database lookup; tax forms; word recognition; Application software; Character recognition; Computer architecture; Image databases; Image recognition; Real time systems; Software design; Streaming media; Text analysis; Throughput;
Journal_Title :
Proceedings of the IEEE