DocumentCode :
2763528
Title :
Segmentation of low-quality typewritten digits
Author :
Rodriguez, C. ; Muguerza, J. ; Navarro, M. ; Zarate, A. ; Martin, J.I. ; Perez, J.M.
Author_Institution :
Comput. Archit. & Technol. Dept., Basque Country Univ., Donostia, Spain
Volume :
2
fYear :
1998
fDate :
16-20 Aug 1998
Firstpage :
1106
Abstract :
This work addresses the segmentation of numeric fields in forms presenting blurring, breaks and touching in digits. In an OCR system, the segmentation phase plays a determinant role in the global accuracy of the system. Segmentation is basically addressed from two approaches: (a) as an isolated phase in the OCR process, and (b) as interacting with the recognition of the segmented item. In this work, we have considered the first one in order to develop a robust new cost function combining vertical projection, Tsujimoto metric (1991) and background information. Unlike other techniques reported in the literature, ours obtains a near-optimum number of break points in fields containing broken, blurred and touching characters, leading to high accuracy in the global OCR system. Our experiments with a sample including about 11283 numeric fields in 144 forms (more than 50000 digits of that kind) show that 99.74% of fields have been correctly segmented. The new cost function only made 50 errors
Keywords :
image segmentation; optical character recognition; OCR system; Tsujimoto metric; background information; blurring; low-quality typewritten digit segmentation; vertical projection; Character recognition; Computer architecture; Cost function; Electronic mail; Feature extraction; Image quality; Image segmentation; Optical character recognition software; Read only memory; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on
Conference_Location :
Brisbane, Qld.
ISSN :
1051-4651
Print_ISBN :
0-8186-8512-3
Type :
conf
DOI :
10.1109/ICPR.1998.711887
Filename :
711887
Link To Document :
بازگشت