DocumentCode :
3677402
Title :
Improved Hybrid Binarization based on Kmeans for Heterogeneous document processing
Author :
Mahmoud Soua;Rostom Kachouri;Mohamed Akil
Author_Institution :
Université
fYear :
2015
Firstpage :
210
Lastpage :
215
Abstract :
Nowadays, more and more scanned documents are converted into editable electronic representation. This proceeding relies on the Optical Character Recognition (OCR) tool-chain. Generally, an OCR system is based on the important binarization step that separates character strokes from the background document. In this context, one of more robust binarization methods is the recently proposed Hybrid Binarization based on Kmeans (HBK). It handles effectively scanned documents which includes text on simple background. Nevertheless, in Heterogeneous documents, HBK ends up with some issues when extracting foreground text from complex background images. Moreover, HBK assumes to have a dark foreground against a clear background. Otherwise, it fails to render correct binarization colors. In this paper, we propose to improve the HBK method for handling efficiently Heterogeneous documents. Indeed, our proposal employs a layout analysis process that classify document regions into text and image. Image regions are enhanced with Gamma Correction (GC) before HBK binarization. Text regions are treated directly with HBK, keeping its effectiveness on text with homogeneous background. To ensure a robust and independent color rendering in the binarized documents, we control the labeling polarity of text and background through a pixel density-based technique. According to our experiments on LRDE and ICDAR datasets, we demonstrate that I-HBK outperforms HBK when dealing with Heterogeneous documents in both F-measure and OCR accuracy.
Keywords :
"Optical character recognition software","Accuracy","Image color analysis","Robustness","Layout","Character recognition"
Publisher :
ieee
Conference_Titel :
Image and Signal Processing and Analysis (ISPA), 2015 9th International Symposium on
ISSN :
1845-5921
Type :
conf
DOI :
10.1109/ISPA.2015.7306060
Filename :
7306060
Link To Document :
بازگشت