مرکز منطقه ای اطلاع رساني علوم و فناوري - The detection of duplicates in document image databases

DocumentCode :

2061168

Title :

The detection of duplicates in document image databases

Author :

Doermann, David ; Li, Huiping ; Kia, Omid

Author_Institution :

Inst. for Adv. Comput. Studies, Maryland Univ., College Park, MD, USA

Volume :

fYear :

1997

fDate :

18-20 Aug 1997

Firstpage :

314

Abstract :

We propose and implement a method for detecting duplicate documents in very large image databases. The method is based on a robust “signature” extracted from each document image which is used to index into a table of previously processed documents. The approach has a number of advantages over OCR or other recognition based methods, including speed and robustness to imaging distortions. To justify the approach and test the scalability, we have developed a simulator which allows us to change parameters of the system and examine performance for millions of document signatures. A complete system is implemented and tested on a test collection of technical articles and memos

Keywords :

document image processing; image recognition; very large databases; visual databases; document image databases; document signatures; duplicate detection; imaging distortions; memos; previously processed documents; robust signature; technical articles; test collection; very large image databases; Database systems; Educational institutions; Filters; Image databases; Image retrieval; Image storage; Indexes; Laboratories; Robustness; System testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on

Conference_Location :

Ulm

Print_ISBN :

0-8186-7898-4

Type :

conf

DOI :

10.1109/ICDAR.1997.619863

Filename :

619863

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2061168