Title :
Stochastic attributed K-d tree modeling of technical paper title pages
Author :
Mao, Song ; Rosenfeld, Azriel ; Kanungo, Tapas
Author_Institution :
Nat. Libr. of Med., Bethesda, MD, USA
Abstract :
Structural information about a document is essential for structured query processing, indexing, and retrieval. A document page can be partitioned into a hierarchy of homogeneous regions such as columns, paragraphs, etc.; these regions are called physical components, and define the physical layout of the page. In this paper we develop a class of models for the physical layouts of technical paper title pages. We model physical layout using hidden semiMarkov models for directional projections of page regions, and a stochastic attributed K-d tree grammar model for the 2D hierarchical structure of these regions. We use the models to generate sets of synthetic title page images of three distinctive styles, which we use in controlled experiments on page structure analysis.
Keywords :
hidden Markov models; image retrieval; 2D hierarchical structure; document page; hidden semiMarkov models; homogeneous regions; image indexing; image retrieval; physical components; stochastic attributed K-d tree modeling; structured query processing; synthetic title page images; technical paper title pages;
Conference_Titel :
Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
Print_ISBN :
0-7803-7750-8
DOI :
10.1109/ICIP.2003.1247016