DocumentCode
2021184
Title
Learning of Pattern-Based Rules for Document Classification
Author
Dengel, Andreas R.
Author_Institution
Univ. of Kaiserslautern, Kaiserslautern
Volume
1
fYear
2007
fDate
23-26 Sept. 2007
Firstpage
123
Lastpage
127
Abstract
Automatic processing of office documents, such as orders, invoices, or offers entails a significant potential for saving costs. Because such domains have a high percentage of special vocabulary, purely statistical approaches fail in automatic classification. The inherent structure and short text messages require specific approaches. We propose a rule-based method to classify mixed stacks of documents into a set of hierarchically organized classes. Rules are learned by extracting patterns of different types from a document sample. The paper focuses on the architecture and on the learning process, presents comparing results to other techniques, and gives an outlook on how to further improve the system.
Keywords
document image processing; image classification; knowledge based systems; learning (artificial intelligence); document classification; office documents; pattern-based rules; rule-based method; Cost function; Delay effects; Dispatching; Filtering; Optical character recognition software; Postal services; Routing; Text analysis; Vocabulary; Voting;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location
Parana
ISSN
1520-5363
Print_ISBN
978-0-7695-2822-9
Type
conf
DOI
10.1109/ICDAR.2007.4378688
Filename
4378688
Link To Document