Title :
On the Reading of Tables of Contents
Author :
Sarkar, Prateek ; Saund, Eric
Author_Institution :
Palo Alto Res. Center, Palo Alto, CA
Abstract :
This paper presents a framework for understanding tables of contents (TOC) of books, journals, and magazines. We propose a universal logical structure representation in terms of a hierarchy of entries, each of which may contain a descriptor and a locator. We enumerate graphical and perceptual cues that provide cues to parsing of tables of contents in terms of this formalism. We make initial suggestions about the form of evaluation metrics for comparing ground truthed tables of contents with the output of recognition algorithms. Typical and a typical tables of contents are used throughout to illustrate significant phenomena that must be dealt with in principled ways in any general TOC interpretation scheme. Finally we discuss implications of our observations on the design of recognition algorithms.
Keywords :
information analysis; evaluation metrics; recognition algorithms design; tables of contents; universal logical structure representation; Table of Contents; evaluation metrics; functional role labeling; ground truth; layout analysis; structure extraction;
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
DOI :
10.1109/DAS.2008.87