مرکز منطقه ای اطلاع رساني علوم و فناوري - Accelerating Allreduce Operation: A Switch-Based Solution

DocumentCode :

3253596

Title :

Accelerating Allreduce Operation: A Switch-Based Solution

Author :

Nongda Hu ; Dawei Wang ; Zheng Cao ; Xuejun An ; Ninghui Sun

Author_Institution :

Inst. of Comput. Technol., Beijing, China

fYear :

2013

fDate :

July 30 2013-Aug. 2 2013

Firstpage :

Lastpage :

Abstract :

Collective operations, such as all reduce, are widely treated as the critical limiting factors in achieving high performance in massively parallel applications. Conventional host-based implementations, which introduce a large amount of point-to-point communications, are less efficient in large-scale systems. To address this issue, we propose a design of switch chip to accelerate collective operations, especially the allreduce operation. The major advantage of the proposed solution is the high scalability since expensive point-to-point communications are avoided. Two kinds of allreduce operations, namely block-allreduce and burst-allreduce, are implemented for short and long messages, respectively. We evaluated the proposed design with both a cycle-accurate simulator and a FPGA prototype system. The experimental results prove that switch-based allreduce implementation is quite efficient and scalable, especially in large-scale systems. In the prototype, our switch-based implementation significantly outperforms the host-based one, with a 16 times improvement in MPI time on 16 nodes. Furthermore, the simulation shows that, upon scaling from 2 to 4096 nodes, the switch-based allreduce latency only increases slightly by less than 2 us.

Keywords :

field programmable gate arrays; large-scale systems; message passing; radio links; FPGA prototype system; MPI time; allreduce operation; block-allreduce; burst-allreduce; collective operations; critical limiting factors; cycle-accurate simulator; host-based implementations; large-scale systems; massively parallel applications; message passing interface; point-to-point communications; switch-based allreduce latency; switch-based solution; Engines; Hardware; Ports (Computers); Program processors; Reliability; Switches; Synchronization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Communications and Networks (ICCCN), 2013 22nd International Conference on

Conference_Location :

Nassau

Print_ISBN :

978-1-4673-5774-6

Type :

conf

DOI :

10.1109/ICCCN.2013.6614098

Filename :

6614098

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3253596