Title :
Exploring data parallelism and locality in wide area networks
Author :
Gu, Yunhong ; Grossman, Robert
Author_Institution :
Univ. of Illinois at Chicago, Chicago, IL
Abstract :
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming structure. Work to date, for example MapReduce and Hadoop, has focused on systems within a data center. In this paper, we present Sphere, a cloud computing system that targets distributed data-intensive applications over wide area networks. Sphere uses a data-parallel computing model that views the processing of distributed datasets as applying a group of operators to each element in the datasets. As a cloud computing system, application developers can use the Sphere API to write very simple code to process distributed datasets in parallel, while the details, including but not limited to, data locations, server heterogeneity, load balancing, and fault tolerance, are transparent to developers. Unlike MapReduce or Hadoop, Sphere supports distributed data processing on a global scale by exploiting data parallelism and locality in systems over wide area networks.
Keywords :
application program interfaces; data handling; distributed processing; software fault tolerance; wide area networks; Hadoop; MapReduce; Sphere API; cloud computing system; data center; data locality; data locations; data-parallel computing model; distributed data-intensive applications; fault tolerance; load balancing; server heterogeneity; very large datasets; wide area networks; Astronomy; Cloud computing; Computer interfaces; Concurrent computing; Data processing; Distributed computing; Load management; Parallel processing; Pervasive computing; Wide area networks;
Conference_Titel :
Many-Task Computing on Grids and Supercomputers, 2008. MTAGS 2008. Workshop on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-2872-4
DOI :
10.1109/MTAGS.2008.4777906