DocumentCode
2146120
Title
A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures
Author
Fang, Jing ; Gao, Liangcai ; Bai, Kun ; Qiu, Ruiheng ; Tao, Xin ; Tang, Zhi
Author_Institution
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
779
Lastpage
783
Abstract
Table detection is always an important task of document analysis and recognition. In this paper, we propose a novel and effective table detection method via visual separators and geometric content layout information, targeting at PDF documents. The visual separators refer to not only the graphic ruling lines but also the white spaces to handle tables with or without ruling lines. Furthermore, we detect page columns in order to assist table region delimitation in complex layout pages. Evaluations of our algorithm on an e-Book dataset and a scientific document dataset show competitive performance. It is noteworthy that the proposed method has been successfully incorporated into a commercial software package for large-scale Chinese e-Book production.
Keywords
document handling; electronic publishing; commercial software package; document analysis; document recognition; e-book dataset; geometric content layout information; multipage PDF document; page column detection; scientific document dataset; table detection method; table region delimitation; tabular structure; visual separator; Electronic publishing; Layout; Particle separators; Portable document format; Text analysis; White spaces; PDF documents; ruling lines; separators; table detection; table spotting;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.304
Filename
6065417
Link To Document