Author_Institution :
Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
Abstract :
As cloud-based computation grows to be an increasingly important paradigm, providing a general computational interface to support datacenter-scale programming has become an imperative research agenda. Many cloud systems use existing virtual machine monitor (VMM) technologies, such as Xen, VMware, and Windows Hypervisor, to multiplex a physical host into multiple virtual hosts and isolate computation on the shared cluster platform. However, traditional multiplexing VMMs do not scale beyond one single physical host, and it alone cannot provide the programming interface and cluster-wide computation that a datacenter system requires. We design a new instruction set architecture, DISA, to unify myriads of compute nodes to form a big virtual machine called DVM and present programmers the view of a single computer, where thousands of tasks run concurrently in a large, unified, and snapshotted memory space. The DVM provides a simple yet scalable programming model and mitigates the scalability bottleneck of traditional distributed shared memory systems. Along with an efficient execution engine, the capacity of a DVM can scale up to support large clusters. We have implemented and tested DVM on four platforms, and our evaluation shows that DVM has excellent performance and scalability. On one physical host, the system overhead of DVM is comparable to that of traditional VMMs. On 16 physical hosts, the DVM runs 10 times faster than MapReduce/Hadoop and X10. On 160 compute nodes in the TH-1/GZ supercomputer, the DVM delivers a 12.99× speedup over the computation on 10 compute nodes. The implementation of DVM also allows it to run above traditional VMMs, and we verify that DVM shows linear speedup on a parallelizable workload on 256 large EC2 instances.
Keywords :
application program interfaces; cloud computing; computer centres; concurrency control; instruction sets; virtual machines; virtualisation; DISA; DVM; MapReduce/Hadoop compute nodes; VMM multiplexing; VMM technologies; VMware; Windows Hypervisor; Xen; big-virtual machine; cloud computing; cloud-based computation; cluster-wide computation; concurrent tasks; datacenter-scale programming; distributed shared memory systems; execution engine; general computational interface; instruction set architecture; large-unified-snapshotted memory space; physical host; programming interface; scalable programming model; shared cluster platform; system overhead; virtual hosts; virtual machine monitor technologies; workload parallelization; Distributed systems; cloud computing; concurrent programming; datacenter; virtualization;