مرکز منطقه ای اطلاع رساني علوم و فناوري - Models and algorithms for duplicate document detection

DocumentCode :

3141043

Title :

Models and algorithms for duplicate document detection

Author :

Lopresti, Daniel P.

Author_Institution :

Lucent Technol. Inc., AT&T Bell Labs., Murray Hill, NJ, USA

fYear :

1999

fDate :

20-22 Sep 1999

Firstpage :

297

Lastpage :

300

Abstract :

This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution derived from the realm of approximate string matching. The robustness of these techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects

Keywords :

string matching; visual databases; approximate string matching; document image databases; duplicate document detection; real-world degradation effects; Data mining; Electrical capacitance tomography; Feature extraction; Image databases; Information management; Microwave integrated circuits; Optical character recognition software; Packaging; Spatial databases; Turning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on

Conference_Location :

Bangalore

Print_ISBN :

0-7695-0318-7

Type :

conf

DOI :

10.1109/ICDAR.1999.791783

Filename :

791783

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3141043