• DocumentCode
    311083
  • Title

    Block selection: a method for segmenting a page image of various editing styles

  • Author

    Wang, Shin-Ywan ; Yagasaki, Toshiaki

  • Author_Institution
    Canon Inf. Syst., Costa Mesa, CA, USA
  • Volume
    1
  • fYear
    1995
  • fDate
    14-16 Aug 1995
  • Firstpage
    128
  • Abstract
    This paper presents a page segmentation method called block selection which not only segments the page image into categorized blocks but also provides a novel tree structure to represent the page blocks for selection. Block selection, more than classifying the text and nontext areas only, can identify the major document elements, such as text, picture, table, frame and line. This ability fits block selection into a wider range of document processing applications. In order to make the usage of block selection more practical to various document styles, many restrictions set on the document by some existing technologies are freed. The language on the document could be English-like, Kanji-like or both. The direction of text could be horizontal, vertical, slanted, or mixed. The editing style of the document is unconstrained. No skew correction is involved regardless of the document style. The formed blocks are described by a hierarchical tree to reflect the page arrangement in the “object” sense. This structural result can be efficiently used for further storage, retrieval or other manipulation purposes. The possible applications using this proposed method are discussed
  • Keywords
    document image processing; image classification; image representation; image segmentation; optical character recognition; text editing; tree data structures; English; Kanji; OCR; block selection; categorized blocks; document processing applications; editing styles; hierarchical tree; page arrangement; page blocks; page image segmentation; skew correction; text classification; tree structure; Application software; Data mining; Image analysis; Image converters; Image segmentation; Indexing; Information systems; Object detection; Optical character recognition software; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
  • Conference_Location
    Montreal, Que.
  • Print_ISBN
    0-8186-7128-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.1995.598959
  • Filename
    598959