DocumentCode
398585
Title
Stochastic attributed K-d tree modeling of technical paper title pages
Author
Mao, Song ; Rosenfeld, Azriel ; Kanungo, Tapas
Author_Institution
Nat. Libr. of Med., Bethesda, MD, USA
Volume
1
fYear
2003
fDate
14-17 Sept. 2003
Abstract
Structural information about a document is essential for structured query processing, indexing, and retrieval. A document page can be partitioned into a hierarchy of homogeneous regions such as columns, paragraphs, etc.; these regions are called physical components, and define the physical layout of the page. In this paper we develop a class of models for the physical layouts of technical paper title pages. We model physical layout using hidden semiMarkov models for directional projections of page regions, and a stochastic attributed K-d tree grammar model for the 2D hierarchical structure of these regions. We use the models to generate sets of synthetic title page images of three distinctive styles, which we use in controlled experiments on page structure analysis.
Keywords
hidden Markov models; image retrieval; 2D hierarchical structure; document page; hidden semiMarkov models; homogeneous regions; image indexing; image retrieval; physical components; stochastic attributed K-d tree modeling; structured query processing; synthetic title page images; technical paper title pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
ISSN
1522-4880
Print_ISBN
0-7803-7750-8
Type
conf
DOI
10.1109/ICIP.2003.1247016
Filename
1247016
Link To Document