Title :
Text line extraction of curved document images using hybrid metric
Author :
Zuming Huang;Jie Gu;Gaofeng Meng;Chunhong Pan
Author_Institution :
Institute of Automation, Chinese Academy of Sciences (CASIA) No.95, Zhongguancun East Road, Beijing 100190, P.R. China
Abstract :
This paper proposes a novel approach to extracting text lines from curved document images that are captured from an opened thick and bounded book or a curled document sheet. We first extract the connected components (CCs) in a binary image and then remove the non-textual CCs. Additionally, we estimate the orientation of each CC through local projections and a feature vector is accordingly defined to describe each CC. Furthermore, a hybrid metric is designed based on the distances between CCs and the corresponding minimum spanning tree which can well exploit the overall structure of the curved text lines is constructed. A tree pruning strategy is finally proposed to cluster the CCs into separated text lines. Experimental results on a wide variety of curved document images demonstrate the effectiveness and efficiency of the proposed method.
Keywords :
"Measurement","Estimation","Histograms","Distortion","DH-HEMTs","Radon","Strips"
Conference_Titel :
Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on
Electronic_ISBN :
2327-0985
DOI :
10.1109/ACPR.2015.7486504