Author :
Zeilemaker, Niels ; Pouwelse, Johan ; Sips, Henk
Abstract :
In recent years fully decentralized file sharing systems were developed aimed at improving anonymity among their users. These systems provide typical file sharing features such as searching for and downloading files. However, elaborate schemes originally aimed at improving anonymity cause partial keyword matching to be virtually impossible, or introduce a substantial bandwidth overhead. In this paper we introduce 4P, a system that provides users with anonymous search on top of a semantic overlay. The semantic overlay allows users to efficiently locate files using partial keyword matching, without having to resort to an expensive flooding operation. Included into 4P are a number of privacy enhancing features such as probabilistic query forwarding, path uncertainty, caching, and encrypted links. Moreover, we integrate a content retrieval channel into our protocol allowing users to start downloading a file from multiple sources immediately without requiring all intermediate nodes to cache a complete copy. Using a trace-based dataset, we mimic a real-world query workload and show the cost and performance of search using six overlay configurations, comparing random, semantic, Gnutella, RetroShare, and OneSwarm to 4P. The state-of-the-art flooding based alternatives required approximately 10,000 messages to be sent per query, in contrast 4P only required 313. Showing that while flooding can achieve a high recall (more than 85% in our experiments) it is prohibitively expensive. With 4P we achieve a recall of 76% at a considerable reduction in messages sent.
Keywords :
cache storage; cryptography; data privacy; peer-to-peer computing; probability; 4P; anonymity improvement; anonymous search; caching; content retrieval channel; decentralized file sharing systems; encrypted links; overlay configurations; partial keyword matching; path uncertainty; performant private peer-to-peer file sharing; privacy enhancing features; probabilistic query forwarding; query workload; semantic overlay; trace-based dataset; Bandwidth; Cryptography; Peer-to-peer computing; Polynomials; Privacy; Probabilistic logic; Semantics;