• DocumentCode
    3487748
  • Title

    The Significance of Reading Order in Document Recognition and Its Evaluation

  • Author

    Clausner, C. ; Pletschacher, S. ; Antonacopoulos, A.

  • Author_Institution
    Pattern Recognition & Image Anal. (PRImA) Res. Lab., Univ. of Salford, Salford, UK
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    688
  • Lastpage
    692
  • Abstract
    Reading order detection and representation is an important task in many digitisation scenarios involving the preservation of the logical structure of a document. The corresponding need for the evaluation of reading order results generated by layout analysis methods poses a particular challenge due to potential deviations between ground truth and actually detected segmentation of the page. To this end a novel evaluation approach that responds to this problem by incorporating region correspondence analysis is proposed. Furthermore, a sophisticated reading order representation scheme is presented and used by the system allowing the grouping of objects with ordered and/or unordered relations. This is a typical requirement for documents with complex layouts such as magazines and newspapers. The evaluation method has been validated using the results of two state-of-the-art OCR / layout analysis systems and a basic top-to-bottom reading order detection algorithm applied on representative samples from the PRImA contemporary and the IMPACT historical document datasets.
  • Keywords
    document image processing; image representation; image segmentation; optical character recognition; IMPACT historical document datasets; OCR; PRImA contemporary; basic top-to-bottom reading order detection algorithm; document recognition; document segmentation; layout analysis methods; logical structure; novel evaluation approach; reading order representation scheme; region correspondence analysis; Engines; Layout; Optical character recognition software; Performance evaluation; Text analysis; Text recognition; document layout analysis; document structure; performance evaluation; reading order detection; reading order evaluation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.141
  • Filename
    6628706