• DocumentCode
    2884698
  • Title

    Conversion of PDF documents into HTML: a case study of document image analysis

  • Author

    Rahman, Fuad ; Alam, Hassan

  • Author_Institution
    BCL Technol. Inc., Santa Clara, CA, USA
  • Volume
    1
  • fYear
    2003
  • fDate
    9-12 Nov. 2003
  • Firstpage
    87
  • Abstract
    Portable document format (PDF) has become the de facto standard in many fields because of its independence of local formatting restrictions and its accurate reproducibility. On the other hand, HTML documents are becoming an integral form of our lives by being the dominant form for information exchange within the World Wide Web environment. This paper discusses how image-processing techniques can be used to perform document layout analysis of complex multiple-column PDF documents. This analysis allows the conversion of these documents into the HTML format keeping the logical and physical layout intact.
  • Keywords
    Internet; document image processing; hypermedia markup languages; HTML; PDF documents; World Wide Web; document image analysis; hypertext markup language; image-processing techniques; information exchange; portable document format; Algorithm design and analysis; Computer aided software engineering; HTML; Image analysis; Image converters; Meteorological radar; Reproducibility of results; Space technology; Text analysis; White spaces;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on
  • Print_ISBN
    0-7803-8104-1
  • Type

    conf

  • DOI
    10.1109/ACSSC.2003.1291873
  • Filename
    1291873