Title :
Reducing communication overhead in distributed learning by an order of magnitude (almost)
Author :
Oland, Anders ; Raj, Bhiksha
Author_Institution :
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Large-scale distributed learning plays an ever-more increasing role in modern computing. However, whether using a compute cluster with thousands of nodes, or a single multi-GPU machine, the most significant bottleneck is that of communication. In this work, we explore the effects of applying quantization and encoding to the parameters of distributed models. We show that, for a neural network, this can be done - without slowing down the convergence, or hurting the generalization of the model. In fact, in our experiments we were able to reduce the communication overhead by nearly an order of magnitude - while actually improving the generalization accuracy.
Keywords :
distributed processing; learning (artificial intelligence); neural nets; distributed model parameters; encoding; large-scale distributed learning; multi-GPU machine; neural network; Accuracy; Convergence; Encoding; Entropy; Heuristic algorithms; Quantization (signal); Training; Compression; Distributed Training; Neural Networks;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
DOI :
10.1109/ICASSP.2015.7178365