Title :
Eigenspace method for text retrieval in historical document images
Author :
Terasawa, Kengo ; Nagasaki, Takeshi ; Kawashima, Toshio
Author_Institution :
Sch. of Syst. Inf. Sci., Future Univ., Hokkaido, Japan
fDate :
29 Aug.-1 Sept. 2005
Abstract :
A new method for text retrieval that does not need segmentation is described. Segmenting the images in historical documents into individual characters is difficult. Therefore, the conventional OCR method, which uses segmentation, does not work well. Our method instead divides the text image into a sequence of small slits. The image region that corresponds to the query image region is retrieved by solving the matching problem of these sequences. Applying the eigenspace method to the slit images enables us to solve the matching problem efficiently. Moreover, using dynamic time warping (DTW) further improves the results. Our method has higher accuracy than the simple template matching method, and it has far higher efficiency in computational cost.
Keywords :
document image processing; history; image matching; image retrieval; text analysis; dynamic time warping; eigenspace method; historical document images; image matching problem; image segmentation; query image region; template matching method; text retrieval; Character recognition; Computational efficiency; Image matching; Image retrieval; Image segmentation; Information retrieval; Information science; Natural languages; Optical character recognition software; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
Print_ISBN :
0-7695-2420-6
DOI :
10.1109/ICDAR.2005.99