Title :
Underline detection and removal in a document image using multiple strategies
Author :
BAI, Zhen-Long ; Huo, Qiang
Author_Institution :
Dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China
Abstract :
This work presents a novel three-module approach for underline detection and removal in Chinese/English OCR. The detection module uses strategies of connected component analysis and bottom edge analysis. The removal module uses different methods for different kinds of underlines. The disambiguation module is effected via recognition confidence comparison for reducing the risk of removing wrongly doubtful underlines. Our approach can deal with untouched, touched, broken and slightly curved underlines. In a benchmark test using single text line images extracted from UW-I database and images captured by C-Pen, we demonstrate that our approach has little negative effect on pure-text images, and can detect and remove reliably underlines in text line images with underlines.
Keywords :
natural languages; optical character recognition; C-Pen; Chinese OCR; English OCR; UW-I database; bottom edge analysis; connected component analysis; recognition confidence comparison; single text line images; underline detection module; Benchmark testing; Computer science; Image analysis; Image databases; Image edge detection; Information systems; Optical character recognition software; Pattern analysis; Pattern recognition; Pixel;
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
Print_ISBN :
0-7695-2128-2
DOI :
10.1109/ICPR.2004.1334314