Title :
Exploration of contextual constraints for character pre-classification
Author :
Ho, Tin Kam ; Nagy, George
fDate :
6/23/1905 12:00:00 AM
Abstract :
We present strategies and results for identifying the symbol type (lower-case, upper-case, digit, and punctuation or special symbols) of every character in a text document by using various kinds of information from neighboring characters. In the expectation of reasonable word and character segmentation for shape clustering, we designed several type recognition methods that depend on cluster n-grams, shape codes, and within word context. On an ASCII test corpus of 925 articles that simulates perfect image-level processing, these methods achieve a substantial improvement over default assignment of all characters to lower case
Keywords :
image segmentation; optical character recognition; character preclassification; character segmentation; contextual constraints exploration; default assignment; image-level processing; shape clustering; symbol type; Character recognition; Engines; Frequency; Image segmentation; Optical character recognition software; Shape; Stability; Testing; Text recognition; Tin;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953830