• DocumentCode
    3487845
  • Title

    Sparse Document Image Coding for Restoration

  • Author

    Kumar, Vipin ; Bansal, Ankur ; Tulsiyan, Goutam Hari ; Mishra, Anadi ; Namboodiri, Anoop ; Jawahar, C.V.

  • Author_Institution
    Center for Visual Inf. Technol., IIIT Hyderabad, Hyderabad, India
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    713
  • Lastpage
    717
  • Abstract
    Sparse representation based image restoration techniques have shown to be successful in solving various inverse problems such as denoising, in painting, and super-resolution, etc. on natural images and videos. In this paper, we explore the use of sparse representation based methods specifically to restore the degraded document images. While natural images form a very small subset of all possible images admitting the possibility of sparse representation, document images are significantly more restricted and are expected to be ideally suited for such a representation. However, the binary nature of textual document images makes dictionary learning and coding techniques unsuitable to be applied directly. We leverage the fact that different characters possess similar strokes, curves, and edges, and learn a dictionary that gives sparse decomposition for patches. Experimental results show significant improvement in image quality and OCR performance on documents collected from a variety of sources such as magazines and books. This method is therefore, ideally suited for restoring highly degraded images in repositories such as digital libraries.
  • Keywords
    document image processing; image coding; image representation; image restoration; learning (artificial intelligence); text analysis; OCR performance; degraded document image restoration; dictionary learning; image quality; natural images; sparse decomposition; sparse document image coding; sparse representation based image restoration techniques; textual document images; Degradation; Dictionaries; Image coding; Image restoration; Noise; Noise measurement; Optical character recognition software; Dictionary learning; Document restoration; Sparse representation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.146
  • Filename
    6628711