DocumentCode :
2733938
Title :
A Framework for the Encoding of Multilayered Documents
Author :
Eldakar, Youssef ; Adly, Noha ; Nagi, Magdy
Author_Institution :
Bibliotheca Alexandrina, Alexandria
fYear :
2006
fDate :
6-6 Dec. 2006
Firstpage :
306
Lastpage :
313
Abstract :
Electronic publishing of material digitized using imaging and OCR calls for a special delivery format capable of reconstructing original documents in a well-usable electronic form. We present a framework for the universal encoding of multilingual image-on-text documents, enabling retrieval systems to text-search and highlight hits on original page images. A generalized format for representation of image-on-text allows for integration of different OCR engines and target format encoders. This framework´s current implementation encodes multilingual content into DjVu and PDF. Performance has been evaluated with focus on file size and shown that overhead of adding text layers is small compared to advantages and that output is comparable to other systems.
Keywords :
document image processing; OCR; electronic publishing; multilayered document encoding; multilingual image-on-text documents; retrieval systems; special delivery format; Books; Electronic publishing; Encoding; Humans; Image coding; Image reconstruction; Image retrieval; Optical character recognition software; Software libraries; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information Management, 2006 1st International Conference on
Conference_Location :
Bangalore
Print_ISBN :
1-4244-0682-X
Type :
conf
DOI :
10.1109/ICDIM.2007.369215
Filename :
4221907
Link To Document :
بازگشت