Title :
Word spotting in Chinese document images without layout analysis
Author :
Lu, Yue ; Tan, Chew Lim
Author_Institution :
Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore
Abstract :
An approach to searching user-specified words/phases in Chinese document images, without the requirements of layout analysis, is proposed in this paper. Bounding boxes of Chinese character images are first determined using the connected component analysis. Next, a suitable character from the user-specified word/phrase is chosen as the initial character to search for a matching candidate in the document. Once a matched candidate is found, its adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word/phrase, subject to the constraints of positional relation and size similarity. The character matching is done in two stages. The coarse matching is carried out based on the stroke density features. A weighted Hausdorff distance is proposed for the second matching phase. Experimental results show that the proposed method can effectively search the user-specified Chinese word/phrase from horizontal or vertical text lines of document images.
Keywords :
character recognition; document image processing; feature extraction; pattern matching; Chinese document image processing; bounding boxes; character matching; coarse matching; connected component analysis; positional relation; size similarity; stroke density features; weighted Hausdorff distance; word spotting; Character recognition; Computer science; Content based retrieval; Costs; Image analysis; Image retrieval; Image storage; Indexing; Information retrieval; Optical character recognition software;
Conference_Titel :
Pattern Recognition, 2002. Proceedings. 16th International Conference on
Print_ISBN :
0-7695-1695-X
DOI :
10.1109/ICPR.2002.1047794