• DocumentCode
    1638117
  • Title

    A Realistic Dataset for Performance Evaluation of Document Layout Analysis

  • Author

    Antonacopoulos, A. ; Bridson, D. ; Papadopoulos, C. ; Pletschacher, S.

  • Author_Institution
    Res. Lab. Sch. of Comput., Sci. & Eng., Univ. of Salford, Manchester, UK
  • fYear
    2009
  • Firstpage
    296
  • Lastpage
    300
  • Abstract
    There is a significant need for a realistic dataset on which to evaluate layout analysis methods and examine their performance in detail. This paper presents a new dataset (and the methodology used to create it) based on a wide range of contemporary documents. Strong emphasis is placed on comprehensive and detailed representation of both complex and simple layouts, and on colour originals. In-depth information is recorded both at the page and region level. Ground truth is efficiently created using a new semi-automated tool and stored in a new comprehensive XML representation, the PAGE format. The dataset can be browsed and searched via a Web-based front end to the underlying database and suitable subsets (relevant to specific evaluation goals) can be selected and downloaded.
  • Keywords
    XML; document handling; online front-ends; software performance evaluation; PAGE format; Web-based front end; comprehensive XML representation; contemporary documents; document layout analysis; performance evaluation; realistic dataset; Data engineering; Databases; Image analysis; Image color analysis; Image recognition; Pattern analysis; Pattern recognition; Performance analysis; Text analysis; XML; Performance evaluation; datasets; ground truth format; layout analysis; pge segmentation; region classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.271
  • Filename
    5277696