• DocumentCode
    2490053
  • Title

    A recursive Otsu thresholding method for scanned document binarization

  • Author

    Nina, Oliver ; Morse, Bryan ; Barrett, William

  • fYear
    2011
  • fDate
    5-7 Jan. 2011
  • Firstpage
    307
  • Lastpage
    314
  • Abstract
    The use of digital images of scanned handwritten historical documents has increased in recent years, especially with the online availability of large document collections. However, the sheer number of images in some of these collections makes them cumbersome to manually read and process, making the need for automated processing of increased importance. A key step in the recognition and retrieval of such documents is binarization, the separation of document text from the page´s background. Binarization of images of historical documents that have been affected by degradation or are otherwise of poor image quality is difficult and continues to be a topic of research in the field of image processing. This paper presents a novel approach to this problem, including two primary variations. One combines a recursive extension of Otsu thresholding and selective bilateral filtering to allow automatic binarization and segmentation of handwritten text images. The other also builds on the recursive Otsu method and adds improved background normalization and a post-processing step to the algorithm to make it more robust and to perform adequately even for images that present bleed-through artifacts. Our results show that these techniques segment the text in historical documents comparable to and in some cases better than many state-of-the-art approaches based on their performance as evaluated using the dataset from the recent ICDAR 2009 Document Image Binarization Contest.
  • Keywords
    document image processing; image segmentation; text analysis; ICDAR 2009 document image binarization contest; background normalization; recursive Otsu thresholding method; scanned document binarization; scanned handwritten historical documents; selective bilateral filtering; Approximation methods; Degradation; Estimation; Hysteresis; Noise; Optical character recognition software; Pixel;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications of Computer Vision (WACV), 2011 IEEE Workshop on
  • Conference_Location
    Kona, HI
  • ISSN
    1550-5790
  • Print_ISBN
    978-1-4244-9496-5
  • Type

    conf

  • DOI
    10.1109/WACV.2011.5711519
  • Filename
    5711519