Automated detection and segmentation of table of contents page from document images

Author

Mandal, S. ; Chowdhury, S.P. ; Das, A.K. ; Chanda, Bhabatosh

Author_Institution

Bengal Eng. Coll., Howrah, India

fYear

2003

fDate

3-6 Aug. 2003

Firstpage

398

Abstract

With an aim to extract the structural information from the table of contents (TOC) to help develop a digital document library, the requirement of identifying/segmenting the TOC page is obvious. The objective to create a digital document library is to provide a non-labour intensive, cheap and flexible way of storing, representing and managing the paper document in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Information from the TOC pages is to be extracted for use in a document database for effective retrieval of the required pages. We present a fully automatic identification and segmentation of a table of contents (TOC) page from a scanned document.

Keywords

character recognition; digital libraries; document image processing; image segmentation; information retrieval; visual databases; TOC page identification; automated detection; automatic identification; digital document library development; document database; document image segmentation; document images; electronic form; information extraction; information retrieval; nonlabour intensive document storage; page segmentation; paper document; scanned document; structural information; table of contents detection;

fLanguage

English

Publisher

ieee

Conference_Titel

Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on

Print_ISBN

0-7695-1960-1

Type

conf

DOI

10.1109/ICDAR.2003.1227697

Filename

1227697