DocumentCode
3309812
Title
ASPDD: An Efficient and Scalable Framework for Duplication Detection
Author
Latha, K. ; Rajmohan, B. ; Rajaram, R.
Author_Institution
Comput. Sci. & Eng. Dept., Anna Univ. Tiruchirappalli, Tiruchirappalli, India
fYear
2010
fDate
20-21 June 2010
Firstpage
153
Lastpage
157
Abstract
This paper introduces a framework for duplicate document detection problem that uses an efficient dynamic program called All Pairs Shortest Path in the text collection. Our goal in this work is to investigate the phenomenon and determine the approach that minimizes the impact of duplicates on search results. We show that our approach scales in terms of the number of documents and works well for documents of all domains. We compared our solution to the state of the art and found that our method has produced promising results in addition to improved accuracy of exact duplicate detection, it has also detected partial and neighbor replica. The robustness of the above techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects.
Keywords
Character generation; Computer science; Costs; Degradation; Educational institutions; Fingerprint recognition; Information technology; Robustness; Sorting; Wildlife; All Pairs Shortest Path; Degradation Effects; Duplicate Document Detection; Neighbor Replica; Partial Replica;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Computer Engineering (ACE), 2010 International Conference on
Conference_Location
Bangalore, Karnataka, India
Print_ISBN
978-1-4244-7154-6
Type
conf
DOI
10.1109/ACE.2010.61
Filename
5532856
Link To Document