DocumentCode
344189
Title
A document image retrieval method tolerating recognition and segmentation errors of OCR using shape-feature and multiple candidates
Author
Kameshiro, Taizo ; Hirano, Takashi ; Okada, Yasuhiro ; Yoda, Fumio
Author_Institution
Inf. Technol. R&D Center, Mitsubishi Electr. Corp., Kanagawa, Japan
fYear
1999
fDate
20-22 Sep 1999
Firstpage
681
Lastpage
684
Abstract
There are document image retrieval methods that are robust to character recognition errors. Some of them tolerate recognition errors by having multiple candidates for a character image, but they are intolerant of segmentation errors of characters. In addition, these methods cannot retrieve documents that do not contain the correct character code. We propose a method that overcomes these problems. This method uses multiple candidates and “shape-feature” which describes the outline of the character shape for uncertain characters. Documents are retrieved using both “shape-feature” and multiple candidate techniques. Our experimental results reveal that the method has a high recall rate compared with that of conventional methods
Keywords
document image processing; image retrieval; image segmentation; optical character recognition; visual databases; OCR; character recognition errors; document image retrieval; experimental results; image recognition; image segmentation; multiple candidates; shape-feature; uncertain characters; Character recognition; Computer errors; Image converters; Image databases; Image recognition; Image retrieval; Image segmentation; Image storage; Optical character recognition software; Spatial databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location
Bangalore
Print_ISBN
0-7695-0318-7
Type
conf
DOI
10.1109/ICDAR.1999.791879
Filename
791879
Link To Document