DocumentCode :
3436478
Title :
Underline detection and removal in a document image using multiple strategies
Author :
BAI, Zhen-Long ; Huo, Qiang
Author_Institution :
Dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China
Volume :
2
fYear :
2004
fDate :
23-26 Aug. 2004
Firstpage :
578
Abstract :
This work presents a novel three-module approach for underline detection and removal in Chinese/English OCR. The detection module uses strategies of connected component analysis and bottom edge analysis. The removal module uses different methods for different kinds of underlines. The disambiguation module is effected via recognition confidence comparison for reducing the risk of removing wrongly doubtful underlines. Our approach can deal with untouched, touched, broken and slightly curved underlines. In a benchmark test using single text line images extracted from UW-I database and images captured by C-Pen, we demonstrate that our approach has little negative effect on pure-text images, and can detect and remove reliably underlines in text line images with underlines.
Keywords :
natural languages; optical character recognition; C-Pen; Chinese OCR; English OCR; UW-I database; bottom edge analysis; connected component analysis; recognition confidence comparison; single text line images; underline detection module; Benchmark testing; Computer science; Image analysis; Image databases; Image edge detection; Information systems; Optical character recognition software; Pattern analysis; Pattern recognition; Pixel;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
ISSN :
1051-4651
Print_ISBN :
0-7695-2128-2
Type :
conf
DOI :
10.1109/ICPR.2004.1334314
Filename :
1334314
Link To Document :
بازگشت