• DocumentCode
    3089390
  • Title

    K-means Clustering in the Cloud -- A Mahout Test

  • Author

    Esteves, Rui Máximo ; Pais, Rui ; Rong, Chunming

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Stavanger, Stavanger, Norway
  • fYear
    2011
  • fDate
    22-25 March 2011
  • Firstpage
    514
  • Lastpage
    519
  • Abstract
    The K-Means is a well known clustering algorithm that has been successfully applied to a wide variety of problems. However, its application has usually been restricted to small datasets. Mahout is a cloud computing approach to K-Means that runs on a Hadoop system. Both Mahout and Hadoop are free and open source. Due to their inexpensive and scalable characteristics, these platforms can be a promising technology to solve data intensive problems which were not trivial in the past. In this work we studied the performance of Mahout using a large data set. The tests were running on Amazon EC2 instances and allowed to compare the gain in runtime when running on a multi node cluster. This paper presents some results of ongoing research.
  • Keywords
    cloud computing; pattern clustering; public domain software; Amazon EC2 instances; Hadoop system; Mahout; cloud computing approach; data intensive problems; data set; k-means clustering algorithm; multinode cluster; open source; Clustering algorithms; Euclidean distance; Machine learning; Machine learning algorithms; Partitioning algorithms; Runtime; K-means; cloud computing; mahout; map reduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops of International Conference on
  • Conference_Location
    Biopolis
  • Print_ISBN
    978-1-61284-829-7
  • Electronic_ISBN
    978-0-7695-4338-3
  • Type

    conf

  • DOI
    10.1109/WAINA.2011.136
  • Filename
    5763553