DocumentCode :
1914555
Title :
Hadoop Acceleration in an OpenFlow-Based Cluster
Author :
Narayan, S. ; Bailey, Susan ; Daga, Anand
Author_Institution :
InfoBlox Inc., Santa Clara, CA, USA
fYear :
2012
fDate :
10-16 Nov. 2012
Firstpage :
535
Lastpage :
538
Abstract :
This paper presents details of our preliminary study of how Hadoop can control its network resources using OpenFlow in order to improve performance. Hadoop´s distributed compute framework called MapReduce, exploits the distributed storage architecture of Hadoop´s distributed file system HDFS to deliver scalable, reliable parallel processing services for arbitrary algorithms. The shuffle phase of Hadoop´s MapReduce computation involves movement of intermediate data from Mappers to Reducers. Reducers are often delayed due to inadequate bandwidth between them and the Mappers, and thereby lower the performance of the cluster. OpenFlow is a popular example of software-defined network (SDN) technology. Our study explores the use of OpenFlow to provide better link bandwidth for shuffle traffic, and thereby decrease the time that Reducers have to wait to gather data from Mappers. Our experiments show decrease in execution time for a Hadoop job, when the shuffle traffic can use more of the available bandwidth on a link. Our approach illustrates how high performance computing applications can improve performance by controlling their underlying network resources. The work presented in this paper is a starting point for some experiments being done as part of SC12 SCinet Research Sandbox which will quantify the performance advantages of a version of Hadoop that uses OpenFlow to dynamically adjust the network topology of local and wide area Hadoop clusters.
Keywords :
computer networks; distributed databases; network operating systems; parallel processing; public domain software; HDFS; Hadoop MapReduce computation; Hadoop acceleration; Hadoop distributed compute framework; Hadoop distributed file system; Mapper; OpenFlow-based cluster; Reducer; SC12 SCinet Research Sandbox; SDN technology; distributed storage architecture; high performance computing applications; local area Hadoop cluster; network resources; network topology; parallel processing services; shuffle traffic; software-defined network technology; wide area Hadoop cluster; BigData; Hadoop; OpenFlow; SDN; Software Defined Networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-6218-4
Type :
conf
DOI :
10.1109/SC.Companion.2012.76
Filename :
6495858
Link To Document :
بازگشت