DocumentCode :
2548967
Title :
Asynchronous Algorithms in MapReduce
Author :
Kambatla, Karthik ; Rapolu, Naresh ; Jagannathan, Suresh ; Grama, Ananth
fYear :
2010
fDate :
20-24 Sept. 2010
Firstpage :
245
Lastpage :
254
Abstract :
Asynchronous algorithms have been demonstrated to improve scalability of a variety of applications in parallel environments. Their distributed adaptations have received relatively less attention, particularly in the context of conventional execution environments and associated overheads. One such framework, MapReduce, has emerged as a commonly used programming framework for large-scale distributed environments. While the MapReduce programming model has proved to be effective for data-parallel applications, significant questions relating to its performance and application scope remain unresolved. The strict synchronization between map and reduce phases limits expression of asynchrony and hence, does not readily support asynchronous algorithms. This paper investigates the notion of partial synchronizations in iterative MapReduce applications to overcome global synchronization overheads. The proposed approach applies a locality-enhancing partition on the computation. Map tasks execute local computations with (relatively) frequent local synchronizations, with less frequent global synchronizations. This approach yields significant performance gains in distributed environments, even though their serial operation counts are higher. We demonstrate these performance gains on asynchronous algorithms for diverse applications, including pagerank, shortestpath, and kmeans. We make the following specific contributions in the paper(i) we motivate the need to extend MapReduce with constructs for asynchrony, (ii) we propose an API to facilitate partial synchronizations combined with eager scheduling and locality enhancing techniques, and (iii) demonstrate performance improvements from our proposed extensions through a variety of applications from different domains.
Keywords :
application program interfaces; iterative methods; parallel algorithms; parallel programming; synchronisation; API; MapReduce programming model; application program interface; asynchronous algorithm; data parallel application; global synchronization; large scale distributed environment; locality enhancing partition; parallel environment; partial synchronization; Clustering algorithms; Convergence; Optimization; Partitioning algorithms; Programming; Scalability; Synchronization; Asynchronous Algorithms; Distributed Computing; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2010 IEEE International Conference on
Conference_Location :
Heraklion, Crete
Print_ISBN :
978-1-4244-8373-0
Electronic_ISBN :
978-0-7695-4220-1
Type :
conf
DOI :
10.1109/CLUSTER.2010.30
Filename :
5600303
Link To Document :
بازگشت