Kylix: A Sparse Allreduce for Commodity Clusters

Author

Huasha Zhao ; Canny, John

Author_Institution

Comput. Sci. Div., Univ. of California, Berkeley, Berkeley, CA, USA

fYear

2014

fDate

9-12 Sept. 2014

Firstpage

273

Lastpage

282

Abstract

Allreduce is a basic building block for parallel computing. Our target here is "Big Data" processing on commodity clusters (mostly sparse power-law data). Allreduce can be used to synchronize models, to maintain distributed datasets, and to perform operations on distributed data such as sparse matrix multiply. We first review a key constraint on cluster communication, the minimum efficient packet size, which hampers the use of direct all-to-all protocols on large networks. Our allreduce network is a nested, heterogeneous-degree butterfly. We show that communication volume in lower layers is typically much less than the top layer, and total communication across all layers a small constant larger than the top layer, which is close to optimal. A chart of network communication volume across layers has a characteristic "Kylix" shape, which gives the method its name. For optimum performance, the butterfly degrees also decrease down the layers. Furthermore, to efficiently route sparse updates to the nodes that need them, the network must be nested. While the approach is amenable to various kinds of sparse data, almost all "Big Data" sets show power-law statistics, and from the properties of these, we derive methods for optimal network design. Finally, we present experiments showing with Kylix on Amazon EC2 and demonstrating significant improvements over existing systems such as PowerGraph and Hadoop.

Keywords

Big Data; data mining; graph theory; learning (artificial intelligence); matrix multiplication; network theory (graphs); parallel processing; sparse matrices; synchronisation; tree data structures; Amazon EC2; Big Data processing; characteristic Kylix shape; cluster communication; commodity clusters; communication volume; direct all-to-all protocols; distributed datasets; distributed graph mining; low-layers; machine learning; minimum efficient packet size; model synchronization; nested-heterogeneous-degree butterfly; network communication volume; optimal network design; optimum performance; parallel computing; power-law statistics; sparse allreduce; sparse matrix multiply; sparse power-law data; sparse updates; top-layer; Benchmark testing; Big data; Fault tolerance; Machine learning algorithms; Sparse matrices; Topology; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel Processing (ICPP), 2014 43rd International Conference on

Conference_Location

Minneapolis MN

ISSN

0190-3918

Type

conf

DOI

10.1109/ICPP.2014.36

Filename

6957236