DocumentCode :
3340893
Title :
Structural Mixtures for Statistical Layout Analysis
Author :
Shafait, Faisal ; van Beusekom, J. ; Keysers, Daniel ; Breuel, Thomas M.
Author_Institution :
German Res. Center for Artificial Intell., Image Understanding & Pattern Recognition, Kaiserslautern
fYear :
2008
fDate :
16-19 Sept. 2008
Firstpage :
415
Lastpage :
422
Abstract :
A key limitation of current layout analysis methods is that they rely on many hard-coded assumptions about document layouts and can not adapt to new layouts for which the underlying assumptions are not satisfied. Another major drawback of these approaches is that they do not return confidence scores for their outputs. These problems pose major challenges in large scale digitization efforts where a large number of different layouts need to be handled and manual inspection of the results on each individual page is not feasible. This paper presents a novel statistical approach to layout analysis that aims at solving the above mentioned problems for Manhattan layouts. The presented approach models known page layouts as a structural mixture model. A probabilistic matching algorithm is presented that gives multiple interpretations of input layout with associated probabilities. First experiments on documents from the publicly available MARG dataset achieved below 5%error rate for geometric layout analysis.
Keywords :
Gaussian distribution; document image processing; image matching; image segmentation; statistical analysis; MARG dataset; Manhattan layout; confidence score; document images; document layout; geometric layout analysis; multivariate Gaussian distribution; page layout; page segmentation; probabilistic matching algorithm; statistical layout analysis; structural mixture model; Artificial intelligence; Books; Image analysis; Image segmentation; Inspection; Large-scale systems; Pattern analysis; Pattern recognition; Probability; Text analysis; Layout analysis; structural mixture model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
Type :
conf
DOI :
10.1109/DAS.2008.61
Filename :
4669989
Link To Document :
بازگشت