Title :
Optimizing data access latencies in cloud systems by intelligent virtual machine placement
Author :
Alicherry, Mansoor ; Lakshman, T.V.
Author_Institution :
Bell Labs. India, Alcatel-Lucent, Bangalore, India
Abstract :
Many cloud applications are data intensive requiring the processing of large data sets and the MapReduce/Hadoop architecture has become the de facto processing framework for these applications. Large data sets are stored in data nodes in the cloud which are typically SAN or NAS devices. Cloud applications process these data sets using a large number of application virtual machines (VMs), with the total completion time being an important performance metric. There are many factors that affect the total completion time of the processing task such as the load on the individual servers, the task scheduling mechanism, communication and data access bottlenecks, etc. One dominating factor that affects completion times for data intensive applications is the access latencies from processing nodes to data nodes. Ideally, one would like to keep all data access local to minimize access latency but this is often not possible due to the size of the data sets, capacity constraints in processing nodes which constrain VMs from being placed in their ideal location and so on. When it is not possible to keep all data access local, one would like to optimize the placement of VMs so that the impact of data access latencies on completion times is minimized. We address this problem of optimized VM placement - given the location of the data sets, we need to determine the locations for placing the VMs so as to minimize data access latencies while satisfying system constraints. We present optimal algorithms for determining the VM locations satisfying various constraints and with objectives that capture natural tradeoffs between minimizing latencies and incurring bandwidth costs. We also consider the problem of incorporating inter-VM latency constraints. In this case, the associated location problem is NP-hard with no effective approximation within a factor of 2 - ϵ for any ϵ > 0. We discuss an effective heuristic for this case and evaluate by simulation the impact of the v- rious tradeoffs in the optimization objectives.
Keywords :
cloud computing; computational complexity; virtual machines; MapReduce-Hadoop architecture; NAS devices; NP-hard problem; SAN devices; VM; bandwidth costs; capacity constraints; cloud applications; cloud systems; data access latency optimization; data intensive applications; data nodes; intelligent virtual machine placement; interVM latency constraints; large data set processing; latency minimization; optimal algorithms; processing nodes; Approximation algorithms; Approximation methods; Bandwidth; Measurement; Minimization; Optimization; Virtual machining;
Conference_Titel :
INFOCOM, 2013 Proceedings IEEE
Conference_Location :
Turin
Print_ISBN :
978-1-4673-5944-3
DOI :
10.1109/INFCOM.2013.6566850