• DocumentCode
    2544498
  • Title

    A System for Parallel Data Mining Service on Cloud

  • Author

    Tao Chen ; Jidong Chen ; Baoyao Zhou

  • Author_Institution
    EMC Labs. China, Beijing, China
  • fYear
    2012
  • fDate
    1-3 Nov. 2012
  • Firstpage
    329
  • Lastpage
    330
  • Abstract
    We present a cloud-based data mining platform which demonstrates the solution of data mining as a service (DMaaS). In the backend, the data processing engine is based on hadoop, an open-source implementation of Google MapReduce. Implementation of the data mining algorithms in Apache Mahout is deployed in the platform. The user can access DMaaS from his browser for analyzing general purpose data mining problems. In this paper, we give an overview of DMaaS, present the system architecture and implementation techniques, and elaborate on a demonstration scenario.
  • Keywords
    cloud computing; data mining; parallel processing; public domain software; Apache Mahout; DMaaS; Google MapReduce; Hadoop; cloud-based data mining platform; data mining-as-a-service; parallel data mining service; Clustering algorithms; Data handling; Data mining; Data processing; Data storage systems; Information management; Vegetation; cloud computing; data mining; hadoop; mahout;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud and Green Computing (CGC), 2012 Second International Conference on
  • Conference_Location
    Xiangtan
  • Print_ISBN
    978-1-4673-3027-5
  • Type

    conf

  • DOI
    10.1109/CGC.2012.49
  • Filename
    6382837