Title :
Analyzing form images by using line-shared-adjacent cell relations
Author_Institution :
Res. Lab., IBM Res., Tokyo
Abstract :
We deal with formats whose fields do not have rigidly determined positions and sizes but have topological relations between them. Such formats are called the “topological formats”. The objective of our research is to establish a method for defining a topological format and detecting fields in images by using that format. The method has the following characteristics: 1) a line-shared-adjacent (LSA) cell relation and a LSA format are proposed, and a topological format can be defined with the LSA format; 2) concepts of hierarchical class can be applied to the format, where a format unification operator is defined to create the hierarchy and can be used to generate a superclass format, and it also allows users to generate formats from scanned images; and 3) an LSA format can be converted into an equivalent line-oriented format that can be used for processing actual forms. Since the format consists of line connection information, the method is robust with respect to flaws of line segments extracted from the images. The method was applied to images of sample forms that have various flaws, and satisfying results were obtained
Keywords :
business forms; document handling; document image processing; edge detection; feature extraction; optical character recognition; topology; equivalent line-oriented format; form image analysis; format unification operator; line segments; line-shared-adjacent cell relations; optical character recognition; superclass format; topological formats; Cities and towns; Data mining; Government; Image analysis; Image converters; Image segmentation; Laboratories; Optical character recognition software; Robustness;
Conference_Titel :
Pattern Recognition, 1996., Proceedings of the 13th International Conference on
Conference_Location :
Vienna
Print_ISBN :
0-8186-7282-X
DOI :
10.1109/ICPR.1996.547272