• DocumentCode
    2954433
  • Title

    Efficient search in file-sharing networks

  • Author

    Burstein, Paul ; Smith, Alan Jay

  • Author_Institution
    EECS Dept., Univ. of California Berkeley, Berkeley, CA
  • Volume
    2
  • fYear
    2007
  • fDate
    5-7 Dec. 2007
  • Firstpage
    1
  • Lastpage
    9
  • Abstract
    Currently, the most popular file-sharing applications have used either centralized or flood-based search algorithms. Napster has been successful in providing a centralized index with presumably perfect query recall. Succeeding, more distributed protocols like Gnutella and Kazaa have used flood-based search procedures in which a query is propagated through an unstructured network. Query result quality in such networks suffers as queries for items that are rare have high probabilities of not being found as the entire corpus is not covered during a search. In this paper, we present a new and improved implementation of a distributed file-sharing system yielding (1) query result quality better than flooding and close to a centralized index, and (2) low-maintenance network overhead. These improvements result from our optimized approaches to (a) high churn rates (clients and servers frequently entering and leaving the system) and (b) skewed workloads (high variation in access frequencies vs. key). High churn rates are addressed by keeping all data in soft state, which is periodically refreshed, such that the loss of a server or client is quickly reflected in the indexes; higher refresh rates imply fewer false positives. Skewed workloads are load balanced with the use of a layer of indirection for placing and locating data, such that data is partitioned and distributed based on the frequency of use. A trace-driven prototype evaluation based on Gnutella system traces shows that our prototype implementation achieves a low network bandwidth, attains max-average load ratios within a factor of three across all servers, and has positive recall values for over 90% of all queries, despite a high churn rate; the recall would be 100% absent churn.
  • Keywords
    distributed algorithms; peer-to-peer computing; protocols; search problems; telecommunication network management; distributed file-sharing network; distributed protocol; flood-based search algorithm; low-maintenance network overhead; query result quality;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems, 2007 International Conference on
  • Conference_Location
    Hsinchu
  • ISSN
    1521-9097
  • Print_ISBN
    978-1-4244-1889-3
  • Electronic_ISBN
    1521-9097
  • Type

    conf

  • DOI
    10.1109/ICPADS.2007.4447742
  • Filename
    4447742