• DocumentCode
    2142605
  • Title

    Ternary Entropy-Based Binarization of Degraded Document Images Using Morphological Operators

  • Author

    Le, T. Hoang Ngan ; Bui, Tien D. ; Suen, Ching Y.

  • Author_Institution
    Dept. Comput. Sci. & Software Eng., Concordia Univ., Montreal, QC, Canada
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    114
  • Lastpage
    118
  • Abstract
    A vast number of historical and badly degraded document images can be found in libraries, public, and national archives. Due to the complex nature of different artifacts, such poor quality documents are hard to read and to process. In this paper, a novel adaptive binarization algorithm using ternary entropy-based approach is proposed. Given an input image, the contrast of intensity is first estimated by a grayscale morphological closing operator. A double-threshold is generated by our Shannon entropy-based ternarizing method to classify pixels into text, near-text, and non-text regions. The pixels in the second region are relabeled by the local mean and the standard deviation. Our proposed method classifies noise into two categories which are processed by binary morphological operators, shrink and swell filters, and graph searching strategy. The method is tested with three databases that have been used in the Document Image Binarization Contest 2009 (DIBCO 2009), the Handwriting Document Image Binarization Contest 2010 (H-DBCIO 2010), and the International Conference on Frontier in Handwriting Recognition 2010 (ICFHR 2010). The evaluation is based upon nine distinct measures. Experimental results show that our proposed algorithm outperforms other state-of-the-art methods.
  • Keywords
    document image processing; entropy; graph theory; image classification; image colour analysis; mathematical operators; DIBCO 2009; Document Image Binarization Contest 2009; H-DBCIO 2010; Handwriting Document Image Binarization Contest 2010; ICFHR 2010; International Conference on Frontier in Handwriting Recognition 2010; Shannon entropy-based ternarizing method; adaptive binarization algorithm; binary morphological operators; degraded document images; double-threshold; graph searching strategy; grayscale morphological closing operator; intensity contrast; library; national archives; pixel classification; poor quality documents; public archives; shrink filters; standard deviation; swell filters; ternary entropy-based approach; ternary entropy-based binarization; Databases; Entropy; Frequency modulation; Gray-scale; Histograms; Noise; Testing; Shannon entropy; binarization; degraded document image; morphological operators; ternary entropy-based;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.32
  • Filename
    6065287