• DocumentCode
    2146120
  • Title

    A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures

  • Author

    Fang, Jing ; Gao, Liangcai ; Bai, Kun ; Qiu, Ruiheng ; Tao, Xin ; Tang, Zhi

  • Author_Institution
    Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    779
  • Lastpage
    783
  • Abstract
    Table detection is always an important task of document analysis and recognition. In this paper, we propose a novel and effective table detection method via visual separators and geometric content layout information, targeting at PDF documents. The visual separators refer to not only the graphic ruling lines but also the white spaces to handle tables with or without ruling lines. Furthermore, we detect page columns in order to assist table region delimitation in complex layout pages. Evaluations of our algorithm on an e-Book dataset and a scientific document dataset show competitive performance. It is noteworthy that the proposed method has been successfully incorporated into a commercial software package for large-scale Chinese e-Book production.
  • Keywords
    document handling; electronic publishing; commercial software package; document analysis; document recognition; e-book dataset; geometric content layout information; multipage PDF document; page column detection; scientific document dataset; table detection method; table region delimitation; tabular structure; visual separator; Electronic publishing; Layout; Particle separators; Portable document format; Text analysis; White spaces; PDF documents; ruling lines; separators; table detection; table spotting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.304
  • Filename
    6065417