DocumentCode :
3487949
Title :
Towards Generic Text-Line Extraction
Author :
Bukhari, Syed Saqib ; Shafait, Faisal ; Breuel, Thomas M.
Author_Institution :
Tech. Univ. of Kaiserslautern, Kaiserslautern, Germany
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
748
Lastpage :
752
Abstract :
Text-line extraction is the backbone of document image analysis. Since decades, a large number of text-line finding methods have been proposed, where these methods rely on certain assumptions about a target class of documents with respect to writing styles, digitization methods, intensity values, and scripts. There is no generic text-line finding method that can be robustly applied to a large variety of simple and complex document images. We introduced the ridge-based text-line finding method, and published its initial results for curled text-line detection on camera-captured document images. In this paper, we demonstrates our ridge-based method as a generic text-line finding approach that can be robustly applied on a diverse collection of simple and complex document images. The comprehensive performance evaluation of the ridge-based method and its comparison with several state-of-the-art methods is presented in the paper. For this purpose, diverse categories of publicly available and standard datasets have been selected: UWIII (scanned, printed English script), DFKI-I (camera-captured, printed English script), UMD (handwritten Chinese, Hindi, and Korean scripts), ICDAR2007 handwritten segmentation contest (handwritten English, French, German and Greek scripts), Arabic/Urdu (scanned, printed script), and Fraktur (scanned, calligraphic German script). Experiments on these datasets show that the ridge-based method achieves better text-line extraction results as those of the best performing, domain-specific text-line finding methods. Firstly, these results show that the ridge-based method is a generic text-line extraction method. Secondly, these results are also helpful for the community to assess the advantages of this method.
Keywords :
document image processing; feature extraction; Arabic-Urdu dataset; DFKI-I dataset; Fraktur dataset; ICDAR2007 handwritten segmentation contest dataset; UMD dataset; UWIII dataset; camera-captured document image; curled text-line detection; digitization methods; document image analysis; generic text-line extraction; intensity values; ridge-based text-line finding method; scripts; writing styles; Accuracy; Gray-scale; Image segmentation; Performance evaluation; Smoothing methods; Standards; Text analysis; Collection of Diverse Documents; Generci Text Line Extraction Method; Generic Layout Analysis; Performance Evaluation and Benchmarking; Ridge-based Text-Line Extraction Method;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.153
Filename :
6628718
Link To Document :
بازگشت