DocumentCode
154138
Title
Kylix: A Sparse Allreduce for Commodity Clusters
Author
Huasha Zhao ; Canny, John
Author_Institution
Comput. Sci. Div., Univ. of California, Berkeley, Berkeley, CA, USA
fYear
2014
fDate
9-12 Sept. 2014
Firstpage
273
Lastpage
282
Abstract
Allreduce is a basic building block for parallel computing. Our target here is "Big Data" processing on commodity clusters (mostly sparse power-law data). Allreduce can be used to synchronize models, to maintain distributed datasets, and to perform operations on distributed data such as sparse matrix multiply. We first review a key constraint on cluster communication, the minimum efficient packet size, which hampers the use of direct all-to-all protocols on large networks. Our allreduce network is a nested, heterogeneous-degree butterfly. We show that communication volume in lower layers is typically much less than the top layer, and total communication across all layers a small constant larger than the top layer, which is close to optimal. A chart of network communication volume across layers has a characteristic "Kylix" shape, which gives the method its name. For optimum performance, the butterfly degrees also decrease down the layers. Furthermore, to efficiently route sparse updates to the nodes that need them, the network must be nested. While the approach is amenable to various kinds of sparse data, almost all "Big Data" sets show power-law statistics, and from the properties of these, we derive methods for optimal network design. Finally, we present experiments showing with Kylix on Amazon EC2 and demonstrating significant improvements over existing systems such as PowerGraph and Hadoop.
Keywords
Big Data; data mining; graph theory; learning (artificial intelligence); matrix multiplication; network theory (graphs); parallel processing; sparse matrices; synchronisation; tree data structures; Amazon EC2; Big Data processing; characteristic Kylix shape; cluster communication; commodity clusters; communication volume; direct all-to-all protocols; distributed datasets; distributed graph mining; low-layers; machine learning; minimum efficient packet size; model synchronization; nested-heterogeneous-degree butterfly; network communication volume; optimal network design; optimum performance; parallel computing; power-law statistics; sparse allreduce; sparse matrix multiply; sparse power-law data; sparse updates; top-layer; Benchmark testing; Big data; Fault tolerance; Machine learning algorithms; Sparse matrices; Topology; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location
Minneapolis MN
ISSN
0190-3918
Type
conf
DOI
10.1109/ICPP.2014.36
Filename
6957236
Link To Document