• DocumentCode
    1544042
  • Title

    A fast algorithm for bottom-up document layout analysis

  • Author

    Simon, Aniko ; Pret, Jean Christophe ; Johnson, A. Peter

  • Author_Institution
    Sch. of Chem., Leeds Univ., UK
  • Volume
    19
  • Issue
    3
  • fYear
    1997
  • fDate
    3/1/1997 12:00:00 AM
  • Firstpage
    273
  • Lastpage
    277
  • Abstract
    This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLIDE (Chemical Literature Data Extraction) system, but the method described here is suitable for a broader range of documents. It is based on Kruskal´s algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression
  • Keywords
    computational complexity; document image processing; heuristic programming; image segmentation; CLIDE; Chemical Literature Data Extraction; block alignments; bottom-up document layout analysis; computational complexity; fast algorithm; heuristics; path-compression; physical page structure; text spacing; Algorithm design and analysis; Chemical analysis; Chemical processes; Data mining; Graphics; Image segmentation; Independent component analysis; Layout; Optical character recognition software; Text analysis;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.584106
  • Filename
    584106