• DocumentCode
    2006916
  • Title

    Distributed Optimization Strategies for Mining on Peer-to-Peer Networks

  • Author

    Dutta, Haimonti ; Matthur, Ananda

  • Author_Institution
    Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    350
  • Lastpage
    355
  • Abstract
    Peer-to-peer (P2P) networks are distributed systems in which nodes of equal roles and capabilities exchange information and services directly with each other. In recent years, they have become a popular way to share large amounts of data. However, such an architecture adds a new dimension to the process of knowledge discovery and data mining -- the challenge of mining distributed (and often) dynamic sources of data and computing. Furthermore, effective utilization of the distributed resources needs to be carefully analyzed. In this paper, we study the problem of optimization of resources to enable efficient and scalable mining on a peer-to-peer (P2P) network. We develop a crawler based on the Gnutella protocol and use it to simulate a P2P network on which we run a classification task. Our results from the case-study indicate that not only do we have an effective utilization of resources but also the accuracy of the distributed mining algorithm is likely to be close to the hypothetical scenario where all data in the network is stored in a central location.
  • Keywords
    data mining; peer-to-peer computing; classification task; data mining; distributed mining algorithm; distributed optimization strategies; distributed resources; distributed systems; knowledge discovery; peer-to-peer networks; Application software; Computer networks; Costs; Crawlers; Data mining; Distributed algorithms; Distributed computing; Machine learning; Master-slave; Peer to peer computing; distributed; optimization; peer-to-peer; simplex;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.57
  • Filename
    4724997