• DocumentCode
    2630867
  • Title

    Document image segmentation and text area ordering

  • Author

    Saitoh, Takashi ; Tachikawa, Michiyoshi ; Yamaai, Toshifumi

  • Author_Institution
    Ricoh R&D Group, Yokohama, Kanagawa, Japan
  • fYear
    1993
  • fDate
    20-22 Oct 1993
  • Firstpage
    323
  • Lastpage
    329
  • Abstract
    A system for document image segmentation and ordering text areas is described and applied to both Japanese and English complex printed page layouts. There is no need to make any assumption about the shape of blocks, hence the segmentation technique can handle not only skewed images without skew-correction but also documents where column are not rectangular. In this technique, on the bottom-up strategy, the connected components are extracted from the reduced image, and classified according to their local information. The connected components are merged into lines, and lines are merged into areas. Extracted text areas are classified as body, caption, header, and footer. A tree graph of the layout of body texts is made, and we get the order of texts by preorder traversal on the graph. The authors introduce the influence range of each node, a procedure for the title part, and extraction of the white horizontal separator. Making it possible to get good results on various documents. The total system is fast and compact
  • Keywords
    document handling; document image processing; feature extraction; image classification; image segmentation; word processing; English complex printed page layouts; Japanese; body texts; bottom-up strategy; connected components; document image segmentation; influence range; local information; preorder traversal; segmentation technique; text area ordering; text areas; tree graph; white horizontal separator; Data mining; Image converters; Image segmentation; Optical character recognition software; Particle separators; Partitioning algorithms; Research and development; Shape; Streaming media; Tree graphs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
  • Conference_Location
    Tsukuba Science City
  • Print_ISBN
    0-8186-4960-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1993.395722
  • Filename
    395722