• DocumentCode
    3340893
  • Title

    Structural Mixtures for Statistical Layout Analysis

  • Author

    Shafait, Faisal ; van Beusekom, J. ; Keysers, Daniel ; Breuel, Thomas M.

  • Author_Institution
    German Res. Center for Artificial Intell., Image Understanding & Pattern Recognition, Kaiserslautern
  • fYear
    2008
  • fDate
    16-19 Sept. 2008
  • Firstpage
    415
  • Lastpage
    422
  • Abstract
    A key limitation of current layout analysis methods is that they rely on many hard-coded assumptions about document layouts and can not adapt to new layouts for which the underlying assumptions are not satisfied. Another major drawback of these approaches is that they do not return confidence scores for their outputs. These problems pose major challenges in large scale digitization efforts where a large number of different layouts need to be handled and manual inspection of the results on each individual page is not feasible. This paper presents a novel statistical approach to layout analysis that aims at solving the above mentioned problems for Manhattan layouts. The presented approach models known page layouts as a structural mixture model. A probabilistic matching algorithm is presented that gives multiple interpretations of input layout with associated probabilities. First experiments on documents from the publicly available MARG dataset achieved below 5%error rate for geometric layout analysis.
  • Keywords
    Gaussian distribution; document image processing; image matching; image segmentation; statistical analysis; MARG dataset; Manhattan layout; confidence score; document images; document layout; geometric layout analysis; multivariate Gaussian distribution; page layout; page segmentation; probabilistic matching algorithm; statistical layout analysis; structural mixture model; Artificial intelligence; Books; Image analysis; Image segmentation; Inspection; Large-scale systems; Pattern analysis; Pattern recognition; Probability; Text analysis; Layout analysis; structural mixture model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
  • Conference_Location
    Nara
  • Print_ISBN
    978-0-7695-3337-7
  • Type

    conf

  • DOI
    10.1109/DAS.2008.61
  • Filename
    4669989