Author_Institution :
Nat. Inst. of Stand. & Technol., Gaithersburg, MD, USA
Abstract :
The National Institute of Standards and Technology (NIST) has developed a form-based handprint recognition system for reading information written on forms. This public domain software test-bed may be obtained from NIST free of charge on CD-ROM. The recognition system is modular in design and integrates algorithms from heterogeneous computational paradigms including artificial intelligence, image processing, robust statistics, and pattern recognition. At the core of the system are some 15 libraries containing more than 725 subroutines and 39000 lines of program code that together define an Application Program Interface (API). Algorithms are provided for conducting generalized form registration, intelligent form removal, adaptive character segmentation, neural network-based classification, and lexical postprocessing. To support these tasks, a host of data, structures and interdisciplinary technologies are utilized, including affine image transformations, image morphology, connected image components, principal component feature analysis, and machine learning. Errors within the functional components of the system are complex and non-additive; therefore, system performance must be analyzed within the context of an end-to-end application. This paper provides a functional description of the software system and its architecture, identifies the key technologies utilized and evaluates the system´s performance on a large application
Keywords :
business forms; document image processing; handwriting recognition; optical character recognition; public domain software; Application Program Interface; adaptive character segmentation; affine image transformations; classification; connected image components; form registration; forms; handprint recognition system; image morphology; intelligent form removal; lexical postprocessing; machine learning; neural network; pattern recognition; principal component feature analysis; public domain software; recognition system; Algorithm design and analysis; Application software; Artificial intelligence; CD-ROMs; Image recognition; Intelligent systems; NIST; Pattern recognition; Software testing; System performance;