• DocumentCode
    2020970
  • Title

    XML Data Representation in Document Image Analysis

  • Author

    Belaïd, Abdel ; Falk, Ingrid ; Rangoni, Yves

  • Author_Institution
    Univ. Nancy 2, Vandoevre-les-Nancy
  • Volume
    1
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    78
  • Lastpage
    82
  • Abstract
    This paper presents the XML-based formats ALTO, TEI, METS used for digital libraries and their interest for data representation in a document image analysis and recognition (DIAR) process. In the first part we briefly present these formats with focus on their adequacy for structural representation and modeling of DIAR data. The second part shows how these formats can be used in a reverse engineering process. Their implementation as a data representation framework will be shown.
  • Keywords
    XML; document image processing; image recognition; image representation; ALTO; METS; TEI; XML data representation; XML-based formats; digital libraries; document image analysis; document image recognition; structural modeling; structural representation; Encoding; Guidelines; Image analysis; Image recognition; Optical character recognition software; Reverse engineering; Software libraries; Text analysis; Text recognition; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4378679
  • Filename
    4378679