• DocumentCode
    2021184
  • Title

    Learning of Pattern-Based Rules for Document Classification

  • Author

    Dengel, Andreas R.

  • Author_Institution
    Univ. of Kaiserslautern, Kaiserslautern
  • Volume
    1
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    123
  • Lastpage
    127
  • Abstract
    Automatic processing of office documents, such as orders, invoices, or offers entails a significant potential for saving costs. Because such domains have a high percentage of special vocabulary, purely statistical approaches fail in automatic classification. The inherent structure and short text messages require specific approaches. We propose a rule-based method to classify mixed stacks of documents into a set of hierarchically organized classes. Rules are learned by extracting patterns of different types from a document sample. The paper focuses on the architecture and on the learning process, presents comparing results to other techniques, and gives an outlook on how to further improve the system.
  • Keywords
    document image processing; image classification; knowledge based systems; learning (artificial intelligence); document classification; office documents; pattern-based rules; rule-based method; Cost function; Delay effects; Dispatching; Filtering; Optical character recognition software; Postal services; Routing; Text analysis; Vocabulary; Voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4378688
  • Filename
    4378688