Title :
A Pixel Labeling Approach for Historical Digitized Books
Author :
Mehri, Milad ; Heroux, Pierre ; Gomez-Kramer, Petra ; Boucher, Alain ; Mullot, Remy
Author_Institution :
L3I, Univ. of La Rochelle, La Rochelle, France
Abstract :
In the context of historical collection conservation and worldwide diffusion, this paper presents an automatic approach of historical book page layout segmentation. In this article, we propose to search the homogeneous regions from the content of historical digitized books with little a priori knowledge by extracting and analyzing texture features. The novelty of this work lies in the unsupervised clustering of the extracted texture descriptors to find homogeneous regions, i.e. graphic and textual regions, by performing the clustering approach on an entire book instead of processing each page individually. We propose firstly to characterize the content of an entire book by extracting the texture information of each page, as our goal is to compare and index the content of digitized books. The extraction of texture features, computed without any hypothesis on the document structure, is based on two non-parametric tools: the autocorrelation function and multiresolution analysis. Secondly, we perform an unsupervised clustering approach on the extracted features in order to classify automatically the homogeneous regions of book pages. The clustering results are assessed by internal and external accuracy measures. The overall results are quite satisfying. Such analysis would help to construct a computer-aided categorization tool of pages.
Keywords :
document image processing; feature extraction; history; image texture; pattern clustering; publishing; unsupervised learning; autocorrelation function; computer-aided categorization tool; document structure; graphic region; historical book page layout segmentation; historical collection conservation; historical digitized books; homogeneous region search; multiresolution analysis; pixel labeling approach; textual region; texture descriptors; texture feature analysis; texture feature extraction; unsupervised clustering approach; Accuracy; Correlation; Feature extraction; Image segmentation; Indexes; Layout; Historical books; autocorrelation; clustering accuracy metrics; consensus clustering; homogeneity; multiresolution; pixel labeling; texture;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.167