• DocumentCode
    2900128
  • Title

    Document page segmentation and layout analysis using soft ordering

  • Author

    Mitchell, Phillip E. ; Yan, Hong

  • Author_Institution
    Sch. of Electr. & Inf. Eng., Sydney Univ., NSW, Australia
  • Volume
    1
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    458
  • Abstract
    This paper presents a novel algorithm for layout analysis of document images. A major component of this algorithm is the independent segmentation algorithm that identifies text and graphics regions. The segmentation algorithm first locates document patterns and then performs classification using run-length characteristics, spread analysis and adjacency relations. A key feature of the layout analysis algorithm is soft ordering which provides a means of ordering regions in a more logical way, and allows for some overlapping between separate regions. This is very useful for processing documents that are slightly skewed or irregular in layout. The algorithm has been tested on many different documents, and can successfully recognise single and multicolumn documents, even when the column format varies several times on one page. Furthermore, it can process documents with text tightly wrapped around graphics and documents that are slightly skewed
  • Keywords
    document image processing; image segmentation; adjacency relations; document images; document layout analysis; document page segmentation; graphics; graphics region identification; independent segmentation algorithm; multicolumn documents; region overlap; run-length characteristics; skewed documents; soft ordering; spread analysis; text region identification; Algorithm design and analysis; Graphics; Image analysis; Image segmentation; Independent component analysis; Layout; Pattern analysis; Performance analysis; Testing; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2000. Proceedings. 15th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-0750-6
  • Type

    conf

  • DOI
    10.1109/ICPR.2000.905375
  • Filename
    905375