• DocumentCode
    3330343
  • Title

    Document page decomposition by the bounding-box project

  • Author

    Ha, Jaekyu ; Haralick, Robert M. ; Phillips, Ihsin T.

  • Author_Institution
    Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA
  • Volume
    2
  • fYear
    1995
  • fDate
    14-16 Aug 1995
  • Firstpage
    1119
  • Abstract
    This paper describes a method for extracting words, textlines and text blocks by analyzing the spatial configuration of bounding boxes of connected component on a given document image. The basic idea is that connected components of black pixels can be used as computational units in document image analysis. In this paper, the problem of extracting words, textlines and text blocks is viewed as a clustering problem in the 2-dimensional discrete domain. Our main strategy is that profiling analysis is utilized to measure horizontal or vertical gaps of (groups of) components during the process of image segmentation. For this purpose, we compute the smallest rectangular box, called the bounding box, which circumscribes a connected component. Those boxes are projected horizontally and/or vertically, and local and global projection profiles are analyzed for word, textline and text-block segmentation. In the last step of segmentation, the document decomposition hierarchy is produced from these segmented objects
  • Keywords
    document image processing; image segmentation; 2-dimensional discrete domain; black pixels; bounding box; bounding-box project; clustering problem; document decomposition hierarchy; document image; document image analysis; document page decomposition; image segmentation; spatial configuration; text blocks; textlines; words extraction; Computer science; Image analysis; Image databases; Image segmentation; Optical character recognition software; Pixel; Printing; Production; Protocols; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
  • Conference_Location
    Montreal, Que.
  • Print_ISBN
    0-8186-7128-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.1995.602115
  • Filename
    602115