DocumentCode
980704
Title
A system to read names and addresses on tax forms
Author
Srihari, Sargur N. ; Shin, Yong-Chul ; Ramanaprasad, Vemulapati ; Lee, Dar-Shyang
Author_Institution
Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA
Volume
84
Issue
7
fYear
1996
fDate
7/1/1996 12:00:00 AM
Firstpage
1038
Lastpage
1049
Abstract
The reading of names and addresses is one of the most complex tasks in automated forms processing. This paper describes an integrated real-time system to read names and addresses on tax forms of the U.S. Internal Revenue Service. The Name and Address Block Reader (NABR) system accepts both machine-printed and hand-printed address block images as input. The application software has two major steps: document analysis (connected component analysis, address block extraction, label detection, hand-print/machine-print discrimination) and document recognition. Document recognition has two nonidentical streams for machine-print and hand-print: the key steps are address parsing, character recognition, word recognition, and postal database lookup. (ZIP+4 and City-State-ZIP files.) System output is a packet containing the results of recognition together with database access status file. Real-time throughput (8500 forms/h) is achieved by employing a loosely coupled multiprocessing architecture where successive input images are distributed to available address recognition processors. The functional architecture, software design, system architecture, and the hardware implementation are described. Performance evaluation on machine-printed and handwritten addresses are presented
Keywords
business forms; character recognition equipment; financial data processing; government data processing; optical character recognition; City-State-ZIP file; Name and Address Block Reader system; US Internal Revenue Service; ZIP+4 file; address block extraction; address parsing; application software; automated processing; character recognition; connected component analysis; database access status file; document analysis; document recognition; hand-printed images; integrated real-time system; label detection; machine-printed images; multiprocessing architecture; postal database lookup; tax forms; word recognition; Application software; Character recognition; Computer architecture; Image databases; Image recognition; Real time systems; Software design; Streaming media; Text analysis; Throughput;
fLanguage
English
Journal_Title
Proceedings of the IEEE
Publisher
ieee
ISSN
0018-9219
Type
jour
DOI
10.1109/5.503302
Filename
503302
Link To Document