DocumentCode :
3488686
Title :
Hybrid Page Segmentation with Efficient Whitespace Rectangles Extraction and Grouping
Author :
Kai Chen ; Fei Yin ; Cheng-Lin Liu
Author_Institution :
Nat. Lab. of Pattern Recognition (NLPR), Inst. of Autom., Beijing, China
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
958
Lastpage :
962
Abstract :
Page segmentation is still a challenging problem due to the large variety of document layouts. Methods examining both foreground and background regions are among the most effective to solve this problem. However, their performance is influenced by the implementation of two key steps: the extraction and selection of background regions, and the grouping of background regions into separators. This paper proposes an efficient hybrid method for page segmentation. The method extracts white space rectangles based on connected component analysis, and filters white space rectangles progressively incorporating foreground and background information such that the remaining rectangles are likely to form column separators. Experimental results on the ICDAR2009 page segmentation competition test set demonstrate the effectiveness and superiority of the proposed method.
Keywords :
document image processing; feature extraction; image segmentation; background information; background region extraction; background region selection; column separators; connected component analysis; document layouts; foreground information; foreground region; hybrid page segmentation; white space rectangles; whitespace rectangle extraction; whitespace rectangle grouping; Image segmentation; Joining processes; Layout; Optical character recognition software; Particle separators; Text analysis; page segmentation; whitespace rectangles extraction; whitespace rectangles grouping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.194
Filename :
6628759
Link To Document :
بازگشت