DocumentCode
451282
Title
Scalable NIC-based Reduction on Large-scale Clusters
Author
Moody, Adam ; Fernandez, Juan ; Petrini, Fabrizio ; Panda, Dhabaleswar K.
Author_Institution
The Ohio State University, Columbus
fYear
2003
fDate
15-21 Nov. 2003
Firstpage
59
Lastpage
59
Abstract
Many parallel algorithms require efficient reduction collectives. In response, researchers have designed algorithms considering a range of parameters including data size, system size, and communication characteristics. Throughout this past work, however, processing was limited to the host CPU. Today, modern Network Interface Cards (NICs) sport programmable processors with substantial memory, and thus introduce a fresh variable into the equation. In this paper, we investigate this new option in the context of large-scale clusters. Through experiments on the 960-node, 1920-processor ASCI Linux Cluster (ALC) at Lawrence Livermore National Laboratory, we show that NIC-based reductions outperform host-based algorithms in terms of reduced latency and increased consistency. In particular, in the largest configuration tested - 1812 processors - our NIC-based algorithm summed single-element vectors of 32-bit integers and 64-bit floating-point numbers in 73 µs and 118 µs, respectively. These results represent respective improvements of 121% and 39% over the production-level MPI library.
Keywords
Algorithm design and analysis; Automatic logic units; Clustering algorithms; Context; Equations; Laboratories; Large-scale systems; Linux; Network interfaces; Parallel algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
Supercomputing, 2003 ACM/IEEE Conference
Print_ISBN
1-58113-695-1
Type
conf
DOI
10.1109/SC.2003.10051
Filename
1592962
Link To Document