• DocumentCode
    856486
  • Title

    From paper to office document standard representation

  • Author

    Dengel, Andreas ; Bleisinger, Rainer ; Hoch, Rainer ; Fein, Frank ; Hönes, Frank

  • Author_Institution
    German Res. Center for Artificial Intelligence, Kaiserslautern, Germany
  • Volume
    25
  • Issue
    7
  • fYear
    1992
  • fDate
    7/1/1992 12:00:00 AM
  • Firstpage
    63
  • Lastpage
    67
  • Abstract
    The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented. Initially, Pi ODA extracts a part-of hierarchy of nested layout objects such as text-blocks, lines, and words based on their presentation on the page. Subsequently, in a step called logical labeling, the layout objects and their compositions are geometrically analyzed to identify corresponding logical objects that can be related to a human perceptible meaning, such as sender, recipient, and date in a letter. A context-sensitive text recognition for logical objects is then applied using logical vocabularies and syntactic knowledge. As a result, Pi ODA produces a document representation that conforms to the ODA international standard.<>
  • Keywords
    computerised picture processing; document image processing; context-sensitive text recognition; lines; logical labeling; logical vocabularies; model-based document analysis system; office document architecture; paper interface; single-sided business letters; standard; syntactic knowledge; text-blocks; words; Artificial intelligence; Humans; Image analysis; Information analysis; Information filtering; Information filters; Text analysis; Text recognition; Tree data structures; Vocabulary;
  • fLanguage
    English
  • Journal_Title
    Computer
  • Publisher
    ieee
  • ISSN
    0018-9162
  • Type

    jour

  • DOI
    10.1109/2.144442
  • Filename
    144442