Title :
Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents
Author :
Xin Tao ; Zhi Tang ; Canhui Xu ; Liangcai Gao
Author_Institution :
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
Abstract :
In this paper, a new dataset is proposed for page layout analysis of born-digital documents. By extracting uniformly the document contents, an XML based data format is designed in terms of raw data and structure data. Utilizing a self-developed ground-truthing tool, a public dataset is constructed from diverse styles of document resources. With consideration of physical segmentation and logical labeling, automatic performance evaluation methods are adjusted to cope with different scenarios. The applications of the proposed dataset have shown that it is suitable for evaluating various layout analysis tasks.
Keywords :
XML; data structures; document handling; performance evaluation; XML; born-digital documents; data format; data structure; ground-truth; page layout analysis; performance evaluation; raw data; Image segmentation; Labeling; Layout; Performance evaluation; Portable document format; Text analysis; XML; born-digital document; dataset; ground-truthing; performance evaluation;
Conference_Titel :
Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
Conference_Location :
Tours
Print_ISBN :
978-1-4799-3243-6
DOI :
10.1109/DAS.2014.37