DocumentCode :
344189
Title :
A document image retrieval method tolerating recognition and segmentation errors of OCR using shape-feature and multiple candidates
Author :
Kameshiro, Taizo ; Hirano, Takashi ; Okada, Yasuhiro ; Yoda, Fumio
Author_Institution :
Inf. Technol. R&D Center, Mitsubishi Electr. Corp., Kanagawa, Japan
fYear :
1999
fDate :
20-22 Sep 1999
Firstpage :
681
Lastpage :
684
Abstract :
There are document image retrieval methods that are robust to character recognition errors. Some of them tolerate recognition errors by having multiple candidates for a character image, but they are intolerant of segmentation errors of characters. In addition, these methods cannot retrieve documents that do not contain the correct character code. We propose a method that overcomes these problems. This method uses multiple candidates and “shape-feature” which describes the outline of the character shape for uncertain characters. Documents are retrieved using both “shape-feature” and multiple candidate techniques. Our experimental results reveal that the method has a high recall rate compared with that of conventional methods
Keywords :
document image processing; image retrieval; image segmentation; optical character recognition; visual databases; OCR; character recognition errors; document image retrieval; experimental results; image recognition; image segmentation; multiple candidates; shape-feature; uncertain characters; Character recognition; Computer errors; Image converters; Image databases; Image recognition; Image retrieval; Image segmentation; Image storage; Optical character recognition software; Spatial databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
Type :
conf
DOI :
10.1109/ICDAR.1999.791879
Filename :
791879
Link To Document :
بازگشت