Title :
A comparison of binarization methods for historical archive documents
Author :
He, J. ; Do, Q.D.M. ; Downton, A.C. ; Kim, J.H.
Author_Institution :
Dept. of Electron. Syst. Eng., Essex Univ., Colchester, UK
fDate :
29 Aug.-1 Sept. 2005
Abstract :
This paper compares several alternative binarization algorithms for historical archive documents, by evaluating their effect on end-to-end word recognition performance in a complete archive document recognition system utilising a commercial OCR engine. The algorithms evaluated are: global thresholding; Niblack´s and Sauvola´s algorithms; adaptive versions of Niblack´s and Sauvola´s algorithms; and Niblack´s and Sauvola´s algorithms applied to background removed images. We found that, for our archive documents, Niblack´s algorithm can achieve better performance than Sauvola´s (which has been claimed as an evolution of Niblack´s algorithm), and that it also achieved better performance than the internal binarization provided as part of the commercial OCR engine.
Keywords :
character recognition; document image processing; history; word processing; Niblack algorithm; Sauvola algorithm; binarization methods; commercial OCR engine; end-to-end word recognition; global thresholding; historical archive documents; Clustering algorithms; Engines; Image color analysis; Image converters; Image recognition; Image segmentation; Optical character recognition software; Pixel; Pursuit algorithms; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
Print_ISBN :
0-7695-2420-6
DOI :
10.1109/ICDAR.2005.3