• DocumentCode
    980704
  • Title

    A system to read names and addresses on tax forms

  • Author

    Srihari, Sargur N. ; Shin, Yong-Chul ; Ramanaprasad, Vemulapati ; Lee, Dar-Shyang

  • Author_Institution
    Dept. of Comput. Sci., State Univ. of New York, Buffalo, NY, USA
  • Volume
    84
  • Issue
    7
  • fYear
    1996
  • fDate
    7/1/1996 12:00:00 AM
  • Firstpage
    1038
  • Lastpage
    1049
  • Abstract
    The reading of names and addresses is one of the most complex tasks in automated forms processing. This paper describes an integrated real-time system to read names and addresses on tax forms of the U.S. Internal Revenue Service. The Name and Address Block Reader (NABR) system accepts both machine-printed and hand-printed address block images as input. The application software has two major steps: document analysis (connected component analysis, address block extraction, label detection, hand-print/machine-print discrimination) and document recognition. Document recognition has two nonidentical streams for machine-print and hand-print: the key steps are address parsing, character recognition, word recognition, and postal database lookup. (ZIP+4 and City-State-ZIP files.) System output is a packet containing the results of recognition together with database access status file. Real-time throughput (8500 forms/h) is achieved by employing a loosely coupled multiprocessing architecture where successive input images are distributed to available address recognition processors. The functional architecture, software design, system architecture, and the hardware implementation are described. Performance evaluation on machine-printed and handwritten addresses are presented
  • Keywords
    business forms; character recognition equipment; financial data processing; government data processing; optical character recognition; City-State-ZIP file; Name and Address Block Reader system; US Internal Revenue Service; ZIP+4 file; address block extraction; address parsing; application software; automated processing; character recognition; connected component analysis; database access status file; document analysis; document recognition; hand-printed images; integrated real-time system; label detection; machine-printed images; multiprocessing architecture; postal database lookup; tax forms; word recognition; Application software; Character recognition; Computer architecture; Image databases; Image recognition; Real time systems; Software design; Streaming media; Text analysis; Throughput;
  • fLanguage
    English
  • Journal_Title
    Proceedings of the IEEE
  • Publisher
    ieee
  • ISSN
    0018-9219
  • Type

    jour

  • DOI
    10.1109/5.503302
  • Filename
    503302