Title :
Clustering and classification of document structure-a machine learning approach
Author :
Dengel, Andreas ; Dubiel, Frank
Author_Institution :
German Res. Center for Artificial Intelligence, Kaiserslautern, Germany
Abstract :
We describe a system which is capable of learning the presentation of document logical structures, exemplarily shown for business letters. Presenting a set of instances to the system, it clusters them into structural concepts and induces a concept hierarchy. This concept hierarchy is taken as a source for classifying future input. The paper introduces the different learning steps, describes how the resulting concept hierarchy is applied for logical labeling and reports on the results
Keywords :
business data processing; classification; document handling; knowledge based systems; learning by example; technical presentation; business letters; concept hierarchy; document logical structure presentation; document structure classification; document structure clustering; learning by example; logical labeling; machine learning approach; Artificial intelligence; Classification tree analysis; Costs; Decision trees; Fuzzy logic; Information retrieval; Labeling; Logic testing; Machine learning; Text analysis;
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
DOI :
10.1109/ICDAR.1995.601965