مرکز منطقه ای اطلاع رساني علوم و فناوري - CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud

DocumentCode :

3143820

Title :

CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud

Author :

Dave, Ankur ; Lu, Wei ; Jackson, Jared ; Barga, Roger

Author_Institution :

Univ. of California, Berkeley, Berkeley, CA, USA

fYear :

2011

fDate :

16-20 May 2011

Firstpage :

1132

Lastpage :

1137

Abstract :

As the emergence of cloud computing brings the potential for large-scale data analysis to a broader community, architectural patterns for data analysis on the cloud, especially those addressing iterative algorithms, are increasingly useful. MapReduce suffers performance limitations for this purpose as it is not inherently designed for iterative algorithms. In this paper we describe our implementation of Cloud Clustering, a distributed k-means clustering algorithm on Microsoft´s Windows Azure cloud. The k-means algorithm makes a good case study because its characteristics are representative of many iterative data analysis algorithms. Cloud Clustering adopts a novel architecture to improve performance without sacrificing fault tolerance. To achieve this goal, we introduce a distributed fault tolerance mechanism called the buddy system, and we make use of data affinity and check pointing. Our goal is to generalize this architecture into a pattern for large-scale iterative data analysis on the cloud.

Keywords :

checkpointing; cloud computing; data analysis; fault tolerance; iterative methods; pattern clustering; CloudClustering; MapReduce; Microsoft Windows Azure cloud; architectural pattern; checkpointing; cloud computing; distributed fault tolerance mechanism; distributed k-means clustering algorithm; iterative data analysis algorithms; iterative data processing pattern; Algorithm design and analysis; Clustering algorithms; Data analysis; Fabrics; Fault tolerance; Fault tolerant systems; Partitioning algorithms;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on

Conference_Location :

Shanghai

ISSN :

1530-2075

Print_ISBN :

978-1-61284-425-1

Electronic_ISBN :

1530-2075

Type :

conf

DOI :

10.1109/IPDPS.2011.258

Filename :

6008901

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3143820