DocumentCode
3340893
Title
Structural Mixtures for Statistical Layout Analysis
Author
Shafait, Faisal ; van Beusekom, J. ; Keysers, Daniel ; Breuel, Thomas M.
Author_Institution
German Res. Center for Artificial Intell., Image Understanding & Pattern Recognition, Kaiserslautern
fYear
2008
fDate
16-19 Sept. 2008
Firstpage
415
Lastpage
422
Abstract
A key limitation of current layout analysis methods is that they rely on many hard-coded assumptions about document layouts and can not adapt to new layouts for which the underlying assumptions are not satisfied. Another major drawback of these approaches is that they do not return confidence scores for their outputs. These problems pose major challenges in large scale digitization efforts where a large number of different layouts need to be handled and manual inspection of the results on each individual page is not feasible. This paper presents a novel statistical approach to layout analysis that aims at solving the above mentioned problems for Manhattan layouts. The presented approach models known page layouts as a structural mixture model. A probabilistic matching algorithm is presented that gives multiple interpretations of input layout with associated probabilities. First experiments on documents from the publicly available MARG dataset achieved below 5%error rate for geometric layout analysis.
Keywords
Gaussian distribution; document image processing; image matching; image segmentation; statistical analysis; MARG dataset; Manhattan layout; confidence score; document images; document layout; geometric layout analysis; multivariate Gaussian distribution; page layout; page segmentation; probabilistic matching algorithm; statistical layout analysis; structural mixture model; Artificial intelligence; Books; Image analysis; Image segmentation; Inspection; Large-scale systems; Pattern analysis; Pattern recognition; Probability; Text analysis; Layout analysis; structural mixture model;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location
Nara
Print_ISBN
978-0-7695-3337-7
Type
conf
DOI
10.1109/DAS.2008.61
Filename
4669989
Link To Document