Title : 
PeerDedupe: Insights into the Peer-Assisted Sampling Deduplication
         
        
            Author : 
Xing, Yuanjian ; Li, Zhenhua ; Dai, Yafei
         
        
            Author_Institution : 
Dept. of Comput. Sci. & Technol., Peking Univ., Beijing, China
         
        
        
        
        
        
            Abstract : 
As the digital data rapidly inflates to a world-wide storage crisis, data deduplication is showing its increasingly prominent function in data storage. Driven by the problems behind the mainstream server-side deduplication schemes, recently there has been a tendency of introducing peer-assisted methods into the deduplication systems. However, this topic is still quite vague at present and lacks thorough research. In this paper, we conduct in-depth and quantitative investigation on the peer-assisted deduplication. Through measurements we observe that the inter-peer duplication accounts for a large proportion of the total duplication, and exhibits strong peer locality. Then based on our observations, we propose PeerDedupe, a novel peer-assisted sampling deduplication approach. Experiments show that PeerDedupe can remove over 98% duplication with each peer coordinating with no more than 5 other peers, and it requires much less server RAM usage than the existing works.
         
        
            Keywords : 
data compression; peer-to-peer computing; random-access storage; sampling methods; storage management; PeerDedupe; RAM usage; data deduplication; digital data; inter-peer duplication account; mainstream server-side deduplication; peer-assisted sampling deduplication; Accuracy; Estimation; Greedy algorithms; Redundancy; Sampling methods; Servers; Weibull distribution;
         
        
        
        
            Conference_Titel : 
Peer-to-Peer Computing (P2P), 2010 IEEE Tenth International Conference on
         
        
            Conference_Location : 
Delft
         
        
            Print_ISBN : 
978-1-4244-7140-9
         
        
            Electronic_ISBN : 
978-1-4244-7139-3
         
        
        
            DOI : 
10.1109/P2P.2010.5570004