DocumentCode :
1992797
Title :
Robust and Efficient Near-Duplicate Documents Detection on P2P Networks
Author :
Zhang, Yan ; Dai, Xuebing ; Wang, Caojing ; Wei, Zhiqiang
Author_Institution :
Ocean Univ. of China Qingdao, Qingdao, China
fYear :
2012
fDate :
27-30 May 2012
Firstpage :
1
Lastpage :
4
Abstract :
Near-duplicate document detection has received much attention from both industrial and research areas during the recent years. Various methods have been proposed to identify near-duplicate documents that share same content with exception of a small fraction difference. Most of research work are discussed on a centralized system, and focus on the accuracy and efficiency of detection method. However, techniques for detecting near-duplicate documents in a distributed system are very much lacking. In this paper, we propose a Robust and Efficient Near-duplICate document detection on a P2P network (RENDIC), which improves the scalability and performance due to the inheritance of P2P network. Experimental results have demonstrated that our method Rendic achieves a higher effectiveness than that of the existing methods.
Keywords :
content management; document handling; peer-to-peer computing; P2P network inheritance; RENDIC; centralized system; content sharing; distributed system; efficient near-duplicate document detection; fraction difference; robust near-duplicate document detection; Accuracy; Bandwidth; Feature extraction; Robustness; Routing; Vectors; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering and Technology (S-CET), 2012 Spring Congress on
Conference_Location :
Xian
Print_ISBN :
978-1-4577-1965-3
Type :
conf
DOI :
10.1109/SCET.2012.6342140
Filename :
6342140
Link To Document :
بازگشت