• DocumentCode
    2142060
  • Title

    Stroke-Like Pattern Noise Removal in Binary Document Images

  • Author

    Agrawal, Mudit ; Doermann, David

  • Author_Institution
    Inst. of Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    17
  • Lastpage
    21
  • Abstract
    This paper presents a two-phased stroke-like pattern noise (SPN) removal algorithm for binary document images. The proposed approach aims at understanding script-independent prominent text component features using supervised classification as a first step. It then uses their cohesiveness and stroke-width properties to filter and associate smaller text components with them using an unsupervised classification technique. In order to perform text extraction, and hence noise removal, at diacritic-level, this divide-and-conquer technique does not assume the availability of accurate and large amounts of ground-truth data at component-level for training purposes. The method was tested on a collection of degraded and noisy, machine-printed and handwritten binary Arabic text documents. Results show pixel-level precision and recall of 86% and 90% respectively for noise-pixels.
  • Keywords
    divide and conquer methods; document image processing; image classification; text analysis; Arabic text documents; binary document images; divide-and-conquer technique; script-independent prominent text component features; stroke-width properties; text extraction; two-phased stroke-like pattern noise removal; unsupervised classification; Accuracy; Clutter; Noise; Noise measurement; Text analysis; Training; Transforms; degraded ruled-line removal; low-density languages; noise; salt-n-pepper; speckle removal; stroke-like pattern noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.13
  • Filename
    6065268