DocumentCode :
3141043
Title :
Models and algorithms for duplicate document detection
Author :
Lopresti, Daniel P.
Author_Institution :
Lucent Technol. Inc., AT&T Bell Labs., Murray Hill, NJ, USA
fYear :
1999
fDate :
20-22 Sep 1999
Firstpage :
297
Lastpage :
300
Abstract :
This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution derived from the realm of approximate string matching. The robustness of these techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects
Keywords :
string matching; visual databases; approximate string matching; document image databases; duplicate document detection; real-world degradation effects; Data mining; Electrical capacitance tomography; Feature extraction; Image databases; Information management; Microwave integrated circuits; Optical character recognition software; Packaging; Spatial databases; Turning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
Type :
conf
DOI :
10.1109/ICDAR.1999.791783
Filename :
791783
Link To Document :
بازگشت