• DocumentCode
    2709731
  • Title

    Scaling up Classifiers to Cloud Computers

  • Author

    Moretti, Christopher ; Steinhaeuser, Karsten ; Thain, Douglas ; Chawla, Nitesh V.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Notre Dame, Notre Dame, IN
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    472
  • Lastpage
    481
  • Abstract
    As the size of available datasets has grown from Megabytes to Gigabytes and now into Terabytes, machine learning algorithms and computing infrastructures have continuously evolved in an effort to keep pace. But at large scales, mining for useful patterns still presents challenges in terms of data management as well as computation. These issues can be addressed by dividing both data and computation to build ensembles of classifiers in a distributed fashion, but trade-offs in cost, performance, and accuracy must be considered when designing or selecting an appropriate architecture. In this paper, we present an abstraction for scalable data mining that allows us to explore these trade-offs. Data and computation are distributed to a computing cloud with minimal effort from the user, and multiple models for data management are available depending on the workload and system configuration. We demonstrate the performance and scalability characteristics of our ensembles using a wide variety of datasets and algorithms on a Condor-based pool with Chirp to handle the storage.
  • Keywords
    data mining; learning (artificial intelligence); cloud computers; computing infrastructures; data management; datasets; machine learning algorithms; scalable data mining; Cloud computing; Computer science; Data engineering; Data mining; Distributed computing; Large-scale systems; Machine learning algorithms; Partitioning algorithms; Scalability; USA Councils; Cloud Computing; Distributed Data Mining; Ensemble Learning; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.99
  • Filename
    4781142