DocumentCode :
2630867
Title :
Document image segmentation and text area ordering
Author :
Saitoh, Takashi ; Tachikawa, Michiyoshi ; Yamaai, Toshifumi
Author_Institution :
Ricoh R&D Group, Yokohama, Kanagawa, Japan
fYear :
1993
fDate :
20-22 Oct 1993
Firstpage :
323
Lastpage :
329
Abstract :
A system for document image segmentation and ordering text areas is described and applied to both Japanese and English complex printed page layouts. There is no need to make any assumption about the shape of blocks, hence the segmentation technique can handle not only skewed images without skew-correction but also documents where column are not rectangular. In this technique, on the bottom-up strategy, the connected components are extracted from the reduced image, and classified according to their local information. The connected components are merged into lines, and lines are merged into areas. Extracted text areas are classified as body, caption, header, and footer. A tree graph of the layout of body texts is made, and we get the order of texts by preorder traversal on the graph. The authors introduce the influence range of each node, a procedure for the title part, and extraction of the white horizontal separator. Making it possible to get good results on various documents. The total system is fast and compact
Keywords :
document handling; document image processing; feature extraction; image classification; image segmentation; word processing; English complex printed page layouts; Japanese; body texts; bottom-up strategy; connected components; document image segmentation; influence range; local information; preorder traversal; segmentation technique; text area ordering; text areas; tree graph; white horizontal separator; Data mining; Image converters; Image segmentation; Optical character recognition software; Particle separators; Partitioning algorithms; Research and development; Shape; Streaming media; Tree graphs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
Type :
conf
DOI :
10.1109/ICDAR.1993.395722
Filename :
395722
Link To Document :
بازگشت