DocumentCode
2142060
Title
Stroke-Like Pattern Noise Removal in Binary Document Images
Author
Agrawal, Mudit ; Doermann, David
Author_Institution
Inst. of Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
17
Lastpage
21
Abstract
This paper presents a two-phased stroke-like pattern noise (SPN) removal algorithm for binary document images. The proposed approach aims at understanding script-independent prominent text component features using supervised classification as a first step. It then uses their cohesiveness and stroke-width properties to filter and associate smaller text components with them using an unsupervised classification technique. In order to perform text extraction, and hence noise removal, at diacritic-level, this divide-and-conquer technique does not assume the availability of accurate and large amounts of ground-truth data at component-level for training purposes. The method was tested on a collection of degraded and noisy, machine-printed and handwritten binary Arabic text documents. Results show pixel-level precision and recall of 86% and 90% respectively for noise-pixels.
Keywords
divide and conquer methods; document image processing; image classification; text analysis; Arabic text documents; binary document images; divide-and-conquer technique; script-independent prominent text component features; stroke-width properties; text extraction; two-phased stroke-like pattern noise removal; unsupervised classification; Accuracy; Clutter; Noise; Noise measurement; Text analysis; Training; Transforms; degraded ruled-line removal; low-density languages; noise; salt-n-pepper; speckle removal; stroke-like pattern noise;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.13
Filename
6065268
Link To Document