DocumentCode :
2884698
Title :
Conversion of PDF documents into HTML: a case study of document image analysis
Author :
Rahman, Fuad ; Alam, Hassan
Author_Institution :
BCL Technol. Inc., Santa Clara, CA, USA
Volume :
1
fYear :
2003
fDate :
9-12 Nov. 2003
Firstpage :
87
Abstract :
Portable document format (PDF) has become the de facto standard in many fields because of its independence of local formatting restrictions and its accurate reproducibility. On the other hand, HTML documents are becoming an integral form of our lives by being the dominant form for information exchange within the World Wide Web environment. This paper discusses how image-processing techniques can be used to perform document layout analysis of complex multiple-column PDF documents. This analysis allows the conversion of these documents into the HTML format keeping the logical and physical layout intact.
Keywords :
Internet; document image processing; hypermedia markup languages; HTML; PDF documents; World Wide Web; document image analysis; hypertext markup language; image-processing techniques; information exchange; portable document format; Algorithm design and analysis; Computer aided software engineering; HTML; Image analysis; Image converters; Meteorological radar; Reproducibility of results; Space technology; Text analysis; White spaces;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on
Print_ISBN :
0-7803-8104-1
Type :
conf
DOI :
10.1109/ACSSC.2003.1291873
Filename :
1291873
Link To Document :
بازگشت