DocumentCode :
1638739
Title :
OCD: An Optimized and Canonical Document Format
Author :
Bloechle, Jean-Luc ; Lalanne, Denis ; Ingold, Rolf
Author_Institution :
Dept. of Inf., Univ. of Fribourg, Fribourg, Switzerland
fYear :
2009
Firstpage :
236
Lastpage :
240
Abstract :
Revealing and being able to manipulate the structured content of PDF documents is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we present OCD, an optimized, easy-to-process and canonical format for representing structured electronic documents. The system and methods used for reverse engineering PDF documents into the OCD format are presented as well as the techniques to optimize it. We finally expose concrete evaluations of our OCD format compactness and restructuring performances.
Keywords :
document handling; optimisation; reverse engineering; OCD format; PDF document; canonical document format; reverse engineering; structured electronic document representation; Concrete; Informatics; Labeling; Optimization methods; Performance evaluation; Reverse engineering; Speech synthesis; Standards publication; Text analysis; Text processing; OCD; PDF; XCDF; XML; logical structure; physical structure; reverse-engineering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
ISSN :
1520-5363
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2009.159
Filename :
5277720
Link To Document :
بازگشت