Title :
An automatic performance evaluation method for document page segmentation
Author :
Peng, Liangrui ; Chen, Ming ; Liu, Changsong ; Ding, Xiaoqing ; Zheng, Jirong
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fDate :
6/23/1905 12:00:00 AM
Abstract :
Automatic performance evaluation for a document page segmentation module is necessary, as OCR products are used to manipulate large scale of documents with complex layout, especially for newspapers. The paper presents a region-based method to evaluate the performance of a page segmentation module by analyzing geometric region relationships between the segmentation results and the preset ground-truth. The ground-truth is not only the correct answer to page segmentation, but also the comparison benchmark of the automatic evaluation, so it has more restricted geometric constraints. The region-matching algorithm is realized by searching the equal region in the segmentation results for each region in the ground-truth. The performance parameters are calculated based on the matching results. An experiment is given to test two page segmentation modules in a popular Chinese OCR product-THOCR2000, and the results show this method is effective
Keywords :
document image processing; geometry; image segmentation; optical character recognition; set theory; OCR products; THOCR2000; automatic performance evaluation method; comparison benchmark; document page segmentation; geometric constraints; geometric region relationships; ground-truth; newspapers; region-based method; region-matching algorithm; Benchmark testing; Error analysis; Image segmentation; Intelligent systems; Laboratories; Large-scale systems; Optical character recognition software; Performance analysis; Pixel; System performance;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953769