DocumentCode :
153388
Title :
Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents
Author :
Xin Tao ; Zhi Tang ; Canhui Xu ; Liangcai Gao
Author_Institution :
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
fYear :
2014
fDate :
7-10 April 2014
Firstpage :
247
Lastpage :
251
Abstract :
In this paper, a new dataset is proposed for page layout analysis of born-digital documents. By extracting uniformly the document contents, an XML based data format is designed in terms of raw data and structure data. Utilizing a self-developed ground-truthing tool, a public dataset is constructed from diverse styles of document resources. With consideration of physical segmentation and logical labeling, automatic performance evaluation methods are adjusted to cope with different scenarios. The applications of the proposed dataset have shown that it is suitable for evaluating various layout analysis tasks.
Keywords :
XML; data structures; document handling; performance evaluation; XML; born-digital documents; data format; data structure; ground-truth; page layout analysis; performance evaluation; raw data; Image segmentation; Labeling; Layout; Performance evaluation; Portable document format; Text analysis; XML; born-digital document; dataset; ground-truthing; performance evaluation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
Conference_Location :
Tours
Print_ISBN :
978-1-4799-3243-6
Type :
conf
DOI :
10.1109/DAS.2014.37
Filename :
6831007
Link To Document :
بازگشت