• DocumentCode
    1362554
  • Title

    The Effect of Border Noise on the Performance of Projection-Based Page Segmentation Methods

  • Author

    Shafait, Faisal ; Breuel, Thomas M.

  • Author_Institution
    Multimedia Anal. & Data Min. (MADM) Competence Center, German Res. Center for Artificial Intell. (DFKl GmbH), Kaiserslautern, Germany
  • Volume
    33
  • Issue
    4
  • fYear
    2011
  • fDate
    4/1/2011 12:00:00 AM
  • Firstpage
    846
  • Lastpage
    851
  • Abstract
    Projection methods have been used in the analysis of bitonal document images for different tasks such as page segmentation and skew correction for more than two decades. However, these algorithms are sensitive to the presence of border noise in document images. Border noise can appear along the page border due to scanning or photocopying. Over the years, several page segmentation algorithms have been proposed in the literature. Some of these algorithms have come into widespread use due to their high accuracy and robustness with respect to border noise. This paper addresses two important questions in this context: 1) Can existing border noise removal algorithms clean up document images to a degree required by projection methods to achieve competitive performance? 2) Can projection methods reach the performance of other state-of-the-art page segmentation algorithms (e.g., Docstrum or Voronoi) for documents where border noise has successfully been removed? We perform extensive experiments on the University of Washington (UW-III) data set with six border noise removal methods. Our results show that although projection methods can achieve the accuracy of other state-of-the-art algorithms on the cleaned document images, existing border noise removal techniques cannot clean up documents captured under a variety of scanning conditions to the degree required to achieve that accuracy.
  • Keywords
    document image processing; image denoising; image segmentation; border noise removal; document cleanup; document image; page border; page segmentation; projection method; skew correction; Accuracy; Algorithm design and analysis; Image segmentation; Layout; Noise; Pixel; Text analysis; Document page segmentation; OCR; border noise removal; document cleanup.; performance evaluation; Algorithms; Image Enhancement; Image Processing, Computer-Assisted; Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2010.194
  • Filename
    5611543