DocumentCode
2733938
Title
A Framework for the Encoding of Multilayered Documents
Author
Eldakar, Youssef ; Adly, Noha ; Nagi, Magdy
Author_Institution
Bibliotheca Alexandrina, Alexandria
fYear
2006
fDate
6-6 Dec. 2006
Firstpage
306
Lastpage
313
Abstract
Electronic publishing of material digitized using imaging and OCR calls for a special delivery format capable of reconstructing original documents in a well-usable electronic form. We present a framework for the universal encoding of multilingual image-on-text documents, enabling retrieval systems to text-search and highlight hits on original page images. A generalized format for representation of image-on-text allows for integration of different OCR engines and target format encoders. This framework´s current implementation encodes multilingual content into DjVu and PDF. Performance has been evaluated with focus on file size and shown that overhead of adding text layers is small compared to advantages and that output is comparable to other systems.
Keywords
document image processing; OCR; electronic publishing; multilayered document encoding; multilingual image-on-text documents; retrieval systems; special delivery format; Books; Electronic publishing; Encoding; Humans; Image coding; Image reconstruction; Image retrieval; Optical character recognition software; Software libraries; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information Management, 2006 1st International Conference on
Conference_Location
Bangalore
Print_ISBN
1-4244-0682-X
Type
conf
DOI
10.1109/ICDIM.2007.369215
Filename
4221907
Link To Document