Title :
Understanding multi-articled documents
Author :
Tsujimoto, Shuichi ; Asada, Haruo
Author_Institution :
Toshiba Corp., Kawasaki, Japan
Abstract :
A document understanding method based on the tree representation of document structures is proposed. It is shown that documents have an obvious hierarchical structure in their geometry which is represented by a tree. A small number of rules are introduced to transform the geometric structure into the logical structure which represents the semantics. The virtual field separator technique is employed to utilize the information carried by special constituents of documents such as field separators and frames, keeping the number of transformation rules small. Experimental results on a variety of document formats have shown that the proposed method is applicable to most of the documents commonly encountered in daily use, although there is still room for further refinement of the transformation rules
Keywords :
document image processing; pattern recognition; trees (mathematics); document understanding; geometric structure; logical structure; pattern recognition; semantics; tree representation; virtual field separator; Abstracts; Desktop publishing; Humans; Image analysis; Natural languages; Particle separators; Research and development; Sections; Text analysis;
Conference_Titel :
Pattern Recognition, 1990. Proceedings., 10th International Conference on
Conference_Location :
Atlantic City, NJ
Print_ISBN :
0-8186-2062-5
DOI :
10.1109/ICPR.1990.118163