DocumentCode :
1801167
Title :
Robust retrieval of noisy text
Author :
Lopresti, Daniel P.
Author_Institution :
Matsushita Information Technol. Lab., Panasonic Technols. Inc., Princeton, NJ, USA
fYear :
1996
fDate :
13-15, May 1996
Firstpage :
76
Lastpage :
85
Abstract :
We examine the effects of simulated OCR errors on Boolean query models for information retrieval. We show that even relatively small amounts of such noise can have a significant impact. To address this issue, we formulate new variants of the traditional models by combining two classic paradigms for dealing with imprecise data: approximate string matching and fuzzy logic. Using a recall/precision analysis of an experiment involving nearly 60 million query evaluations, we demonstrate that the new fuzzy retrieval methods are generally more robust than their “sharp” counterparts
Keywords :
Boolean functions; fuzzy logic; information retrieval; optical character recognition; query processing; string matching; Boolean query models; approximate string matching; fuzzy logic; fuzzy retrieval method; imprecise data; information retrieval; noisy text retrieval; query evaluations; recall precision analysis; simulated OCR errors; Computer errors; Content based retrieval; Databases; Fuzzy logic; Information retrieval; Information technology; Laboratories; Noise robustness; Optical character recognition software; Query processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Libraries, 1996. ADL '96., Proceedings of the Third Forum on Research and Technology Advances in
Conference_Location :
Washington, DC
Print_ISBN :
0-8186-7403-2
Type :
conf
DOI :
10.1109/ADL.1996.502518
Filename :
502518
Link To Document :
بازگشت