Title :
Page grammars and page parsing. A syntactic approach to document layout recognition
Author_Institution :
Hitachi Dublin Lab., Trinity Coll., Dublin, Ireland
Abstract :
Describes a syntactic approach to deducing the logical structure of printed documents from their physical layout. Page layout is described by a two-dimensional grammar, similar to a context-free string grammar, and a chart parser is used to parse segmented page images according to the grammar. This process is part of a system which reads scanned document images and produces computer-readable text in a logical mark-up format such as SGML. The system is briefly outlined, the grammar formalism and the parsing algorithm are described in detail, and some experimental results are reported
Keywords :
context-free grammars; document image processing; image recognition; page description languages; 2D grammar; SGML; chart parser; computer-readable text; context-free string grammar; document layout recognition; logical document structure deduction; logical mark-up format; page grammars; page layout; page parsing; scanned document images; segmented page images; syntactic approach; Character recognition; Educational institutions; Graphics; Image segmentation; Indexing; Laboratories; Layout; SGML; Text recognition; Tree graphs;
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
DOI :
10.1109/ICDAR.1993.395626