DocumentCode
2884698
Title
Conversion of PDF documents into HTML: a case study of document image analysis
Author
Rahman, Fuad ; Alam, Hassan
Author_Institution
BCL Technol. Inc., Santa Clara, CA, USA
Volume
1
fYear
2003
fDate
9-12 Nov. 2003
Firstpage
87
Abstract
Portable document format (PDF) has become the de facto standard in many fields because of its independence of local formatting restrictions and its accurate reproducibility. On the other hand, HTML documents are becoming an integral form of our lives by being the dominant form for information exchange within the World Wide Web environment. This paper discusses how image-processing techniques can be used to perform document layout analysis of complex multiple-column PDF documents. This analysis allows the conversion of these documents into the HTML format keeping the logical and physical layout intact.
Keywords
Internet; document image processing; hypermedia markup languages; HTML; PDF documents; World Wide Web; document image analysis; hypertext markup language; image-processing techniques; information exchange; portable document format; Algorithm design and analysis; Computer aided software engineering; HTML; Image analysis; Image converters; Meteorological radar; Reproducibility of results; Space technology; Text analysis; White spaces;
fLanguage
English
Publisher
ieee
Conference_Titel
Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on
Print_ISBN
0-7803-8104-1
Type
conf
DOI
10.1109/ACSSC.2003.1291873
Filename
1291873
Link To Document