• DocumentCode
    311136
  • Title

    Experiments on extracting structural information from paper documents using syntactic pattern analysis

  • Author

    Bayer, T.A. ; Walischewski, H.

  • Author_Institution
    Daimler-Benz AG, Ulm, Germany
  • Volume
    1
  • fYear
    1995
  • fDate
    14-16 Aug 1995
  • Firstpage
    476
  • Abstract
    Extracting structural information from paper documents supports the daily document processing by, for example, automatically finding index terms, document topics, etc. Knowledge about such components are modeled in a semantic net, which describes geometric properties, spatial relationships, lexical entities as well as lexical relationships. The document model is used to extract the sender, date, recipient, opening and closing formula from a business letter. 181 business letters have been processed, divided into a training set of 20 and the remaining ones for testing. The error rates for the test set range from 0.022 to 0.049 by an average rejection rate of 0.4. Results show that the computational effort can be limited to O(n2) given n primitive objects for matching
  • Keywords
    document image processing; knowledge acquisition; pattern recognition; semantic networks; daily document processing; document topics; error rates; geometric properties; index terms; lexical entities; lexical relationships; paper documents; primitive objects; semantic net; spatial relationships; structural information; syntactic pattern analysis; Artificial intelligence; Data mining; Electronics packaging; Error analysis; Humans; Information analysis; Optical character recognition software; Pattern analysis; Testing; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
  • Conference_Location
    Montreal, Que.
  • Print_ISBN
    0-8186-7128-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.1995.599039
  • Filename
    599039