• DocumentCode
    2629167
  • Title

    Image based typographic analysis of documents

  • Author

    Doermann, David S. ; Furuta, Richard

  • Author_Institution
    Center for Autom. Res., Maryland Univ., College Park, MD, USA
  • fYear
    1993
  • fDate
    20-22 Oct 1993
  • Firstpage
    769
  • Lastpage
    773
  • Abstract
    An approach to image based typographic analysis of documents is provided. The problem requires a spatial understanding of the document layout as well as knowledge of the proper syntax. The system performs a page synthesis from the stream of formatting commands defined in a DVI file. Since the two-dimensional relationships between document components are not explicit in the page language, the authors develop a representation which preserves the two-dimensional layout, the read-order and the attributes of document components. From this hierarchical representation of the page layout we extract and analyze relevant typographic features such as margins, line and character spacing, and figure placement
  • Keywords
    document image processing; feature extraction; page description languages; spatial data structures; 2D relationships; DVI file; character spacing; data representation; document component attributes; document layout; feature extraction; figure placement; formatting commands; hierarchical representation; image based typographic analysis; line spacing; margins; page language; page layout; page synthesis; read-order; spatial understanding; syntax; Automation; Computer errors; Graphics; Image analysis; Layout; Page description languages; Printers; Printing; Text analysis; Typesetting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
  • Conference_Location
    Tsukuba Science City
  • Print_ISBN
    0-8186-4960-7
  • Type

    conf

  • DOI
    10.1109/ICDAR.1993.395624
  • Filename
    395624