DocumentCode
1783736
Title
A new hybrid binarization method based on Kmeans
Author
Soua, Mahmoud ; Kachouri, Rostom ; Akil, Mohamed
Author_Institution
LIGM, ESIEE Paris, Noisy-Le-Grand, France
fYear
2014
fDate
21-23 May 2014
Firstpage
118
Lastpage
123
Abstract
The document binarization is a fundamental processing step toward Optical Character Recognition (OCR). It aims to separate the foreground text from the document background. In this article, we propose a novel binarization technique combining local and global approaches using the clustering algorithm Kmeans. The proposed Hybrid Binarization, based on Kmeans (HBK), performs a robust binarization on scanned documents. According to several experiments, we demonstrate that the HBK method improves the binarization quality while minimizing the amount of distortion. Moreover, it outperforms several well-known state of the art methods in the OCR evaluation.
Keywords
document image processing; learning (artificial intelligence); optical character recognition; pattern clustering; HBK method; Kmeans clustering algorithm; OCR evaluation; binarization quality; binarization technique; distortion amount; document binarization; hybrid binarization method; optical character recognition; scanned documents; Character recognition; Clustering algorithms; Distortion measurement; Histograms; Optical character recognition software; Optical distortion; Robustness; Kmeans; OCR; Scanned documents; binarization;
fLanguage
English
Publisher
ieee
Conference_Titel
Communications, Control and Signal Processing (ISCCSP), 2014 6th International Symposium on
Conference_Location
Athens
Type
conf
DOI
10.1109/ISCCSP.2014.6877830
Filename
6877830
Link To Document