DocumentCode :
3256909
Title :
On the Performance of Apache Hadoop in a Tiny Private IaaS Cloud
Author :
Loewen, Gabriel ; Galloway, Michael ; Vrbsky, Susan
Author_Institution :
Dept. of Comput. Sci., Univ. of Alabama, Tuscaloosa, AL, USA
fYear :
2013
fDate :
15-17 April 2013
Firstpage :
189
Lastpage :
195
Abstract :
High performance and parallel computing are traditionally implemented on very large dedicated compute clusters. However, as many organizations begin to adopt service-oriented cloud-based infrastructures, we can expect to see the development of parallel computing in the cloud. The goal of a parallel compute cluster is to divide a large job into several small jobs, execute the small jobs in parallel on many compute nodes, and then combine the results in some coherent manner. The biggest hurdle in moving this type of service to a cloud-based infrastructure is that performance will undoubtedly be affected by many factors, particularly those related to virtualization in clouds, such as memory and CPU overhead, limited resources, and others relating to hardware virtualization. In order to fully understand how virtualization can affect parallel computing in a tiny private cloud, we have devised four case studies that examine the performance of Apache Hadoop in varying environments on our private cloud. Our case studies are comprised of a baseline or bare metal (non-virtualized) cluster deployment consisting of seven nodes, a seven-node virtual machine cluster, a twenty-node virtual machine cluster, and an optimized seven-node virtual machine cluster. Results show that, although small data sets result in comparable job completion times, as the data size increases the performance of Apache Hadoop is affected greatly by virtualization even when we attempt to optimize the configuration of our cloud.
Keywords :
cloud computing; parallel processing; service-oriented architecture; virtual machines; virtualisation; Apache Hadoop; hardware virtualization; high performance computing; optimized seven-node virtual machine cluster; parallel compute cluster; parallel computing; service-oriented cloud-based infrastructures; tiny private IaaS cloud; twenty-node virtual machine cluster; Cloud computing; Hardware; Metals; Organizations; Parallel processing; Virtual machining; Virtualization; Apache Hadoop; Case Studies; Eucalyptus; Infrastructure-as-a-Service; Kernel-Based Virtual Machine Hypervisor; Parallel Computing; Performance; Virtualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology: New Generations (ITNG), 2013 Tenth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-0-7695-4967-5
Type :
conf
DOI :
10.1109/ITNG.2013.32
Filename :
6614308
Link To Document :
بازگشت