• DocumentCode
    3485859
  • Title

    Self Learning Classification for Degraded Document Images by Sparse Representation

  • Author

    Bolan Su ; Shuangxuan Tian ; Shijian Lu ; Thien Anh Dinh ; Chew Lim Tan

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    155
  • Lastpage
    159
  • Abstract
    Document Image Binarization is a technique to segment text out from the background region of a document image, which is a challenging task due to high intensity variations of the document foreground and background. Recently, a series of document image binarization contests (DIBCOs) had been held that have drawn great research interest in this area. Several document binarization techniques have been proposed and achieve great performance on the contest datasets. However, those proposed techniques may not perform well on all kinds of degraded document images because it is difficult to design a classification method that correctly models the non-uniform degraded document background and text foreground simultaneously. In this paper, we propose a self learning classification framework that combines binary outputs of different binarization methods. The proposed framework makes used of the sparse representation to re-classify the document pixels and produces a better binary results. The experimental results on the recent DIBCO contests show the great performance and robustness of our proposed framework on different kinds of degraded document images.
  • Keywords
    document image processing; image classification; image representation; learning (artificial intelligence); text analysis; DIBCO; degraded document image classification; document foreground; document image binarization contests; document pixel reclassification; high intensity variations; nonuniform degraded document background; self-learning classification framework; sparse representation; text foreground; text segmentation; Equations; Feature extraction; Robustness; Testing; Text analysis; Training; Vectors; Document Image Binarization; Self Learning Classification; Sparse Representation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.38
  • Filename
    6628603