DocumentCode
2029553
Title
Document retrieval system tolerant of segmentation errors of document images
Author
Nagasaki, Takeshi ; Takahashi, Toshikazu ; Marukawa, Katsumi
Author_Institution
Central Res. Laboratory, Hitachi, Ltd., Tokyo, Japan
fYear
2004
fDate
26-29 Oct. 2004
Firstpage
280
Lastpage
285
Abstract
This paper describes a new document retrieval method that is tolerant of OCR segmentation errors in document images. To overcome the segmentation and recognition errors that most OCR-based retrieval systems suffer from, the proposed method consists of two processing phases. First, the OCR engine first generates multiple character-segmentation and recognition hypotheses. Then the retrieval engine extracts keywords from the recognition hypotheses by using lexicon-driven dynamic programming (DP) matching. We have applied this method to both handwritten and printed document images and have demonstrated its effectiveness in reducing false drops and false alarms.
Keywords
document image processing; dynamic programming; feature extraction; image segmentation; information retrieval; optical character recognition; OCR engine; document images segmentation; document retrieval system; lexicon-driven dynamic programming matching; multiple character-segmentation; optical character recognition; recognition hypotheses; retrieval engine extraction; Character generation; Character recognition; Dictionaries; Dynamic programming; Engines; Image retrieval; Image segmentation; Image sequence analysis; Laboratories; Optical character recognition software;
fLanguage
English
Publisher
ieee
Conference_Titel
Frontiers in Handwriting Recognition, 2004. IWFHR-9 2004. Ninth International Workshop on
ISSN
1550-5235
Print_ISBN
0-7695-2187-8
Type
conf
DOI
10.1109/IWFHR.2004.36
Filename
1363924
Link To Document