• DocumentCode
    2733938
  • Title

    A Framework for the Encoding of Multilayered Documents

  • Author

    Eldakar, Youssef ; Adly, Noha ; Nagi, Magdy

  • Author_Institution
    Bibliotheca Alexandrina, Alexandria
  • fYear
    2006
  • fDate
    6-6 Dec. 2006
  • Firstpage
    306
  • Lastpage
    313
  • Abstract
    Electronic publishing of material digitized using imaging and OCR calls for a special delivery format capable of reconstructing original documents in a well-usable electronic form. We present a framework for the universal encoding of multilingual image-on-text documents, enabling retrieval systems to text-search and highlight hits on original page images. A generalized format for representation of image-on-text allows for integration of different OCR engines and target format encoders. This framework´s current implementation encodes multilingual content into DjVu and PDF. Performance has been evaluated with focus on file size and shown that overhead of adding text layers is small compared to advantages and that output is comparable to other systems.
  • Keywords
    document image processing; OCR; electronic publishing; multilayered document encoding; multilingual image-on-text documents; retrieval systems; special delivery format; Books; Electronic publishing; Encoding; Humans; Image coding; Image reconstruction; Image retrieval; Optical character recognition software; Software libraries; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management, 2006 1st International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    1-4244-0682-X
  • Type

    conf

  • DOI
    10.1109/ICDIM.2007.369215
  • Filename
    4221907