Title :
Stroke-Like Pattern Noise Removal in Binary Document Images
Author :
Agrawal, Mudit ; Doermann, David
Author_Institution :
Inst. of Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
Abstract :
This paper presents a two-phased stroke-like pattern noise (SPN) removal algorithm for binary document images. The proposed approach aims at understanding script-independent prominent text component features using supervised classification as a first step. It then uses their cohesiveness and stroke-width properties to filter and associate smaller text components with them using an unsupervised classification technique. In order to perform text extraction, and hence noise removal, at diacritic-level, this divide-and-conquer technique does not assume the availability of accurate and large amounts of ground-truth data at component-level for training purposes. The method was tested on a collection of degraded and noisy, machine-printed and handwritten binary Arabic text documents. Results show pixel-level precision and recall of 86% and 90% respectively for noise-pixels.
Keywords :
divide and conquer methods; document image processing; image classification; text analysis; Arabic text documents; binary document images; divide-and-conquer technique; script-independent prominent text component features; stroke-width properties; text extraction; two-phased stroke-like pattern noise removal; unsupervised classification; Accuracy; Clutter; Noise; Noise measurement; Text analysis; Training; Transforms; degraded ruled-line removal; low-density languages; noise; salt-n-pepper; speckle removal; stroke-like pattern noise;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.13