Title :
Text line extraction in document images
Author :
Liuan Wang;Wei Fan;Jun Sun;Satshi Naoi;Tanaka Hiroshi
Author_Institution :
Fujitsu Research &
Abstract :
Text line extraction in document images is an important prerequisite for many content based image understanding applications. In this paper, we propose an accurate and robust method for generic text line extraction, which can be applied on large categories of document images, diverse languages, and text lines with different orientations. Firstly, the candidate connected components are extracted from document image using Maximal Stable Extremal Region (MSER) with the noises filtered by Adaboost and Convolution Neural Network (CNN). Then, the coarse text lines are generated from hierarchical edges reconstruction and cut by local linearity of text lines in the document spanning tree. Finally, for accurate text line extraction, the cut multi-components are re-connected based on text line energy minimization in terms of text line consistency and the fitting error. Experimental results on multilingual test dataset demonstrate the effectiveness and robust of the proposed method, which yields higher performance compared with state-of-the-art methods.
Keywords :
"Robustness","Benchmark testing","Image segmentation","Surveillance","Image recognition","Integrated optics","Optical imaging"
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
DOI :
10.1109/ICDAR.2015.7333750