DocumentCode :
625638
Title :
Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions
Author :
Solomonik, Edgar ; Matthews, Darin ; Hammond, Jeff R. ; Demmel, J.
Author_Institution :
Berkeley Dept. EECS, Univ. of California, Berkeley, Berkeley, CA, USA
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
813
Lastpage :
824
Abstract :
Cyclops (cyclic-operations) Tensor Framework (CTF) 1 is a distributed library for tensor contractions. CTF aims to scale high-dimensional tensor contractions such as those required in the Coupled Cluster (CC) electronic structure method to massively-parallel supercomputers. The framework preserves tensor structure by subdividing tensors cyclically, producing a regular parallel decomposition. An internal virtualization layer provides completely general mapping support while maintaining ideal load balance. The mapping framework decides on the best mapping for each tensor contraction at run-time via explicit calculations of memory usage and communication volume. CTF employs a general redistribution kernel, which transposes tensors of any dimension between arbitrary distributed layouts, yet touches each piece of data only once. Sequential symmetric contractions are reduced to matrix multiplication calls via tensor index transpositions and partial unpacking. The user-level interface elegantly expresses arbitrary-dimensional generalized tensor contractions in the form of a domain specific language. We demonstrate performance of CC with single and double excitations on 8192 nodes of Blue Gene/Q and show that CTF outperforms NWChem on Cray XE6 supercomputers for benchmarked systems.
Keywords :
matrix multiplication; parallel processing; resource allocation; tensors; user interfaces; virtualisation; Blue Gene/Q; CC electronic structure method; CC performance; CTF; arbitrary distributed layouts; arbitrary-dimensional generalized tensor contractions; communication imbalance reduction; communication volume; coupled cluster electronic structure method; cyclic tensor subdivision; cyclic-operation tensor framework; cyclops tensor framework; distributed library; domain specific language; double excitations; general mapping support; general redistribution kernel; high-dimensional tensor contractions; ideal load balance; internal virtualization layer; load imbalance elimination; massively parallel contractions; massively-parallel supercomputers; matrix multiplication calls; memory usage; partial unpacking; regular parallel decomposition; sequential symmetric contractions; single excitations; tensor index transpositions; tensor structure preservation; user-level interface; Chemistry; Clustering algorithms; Equations; Indexes; Manganese; Program processors; Tensile stress; Coupled Cluster; Cyclops; communication-avoiding algorithms; tensor contractions;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
ISSN :
1530-2075
Print_ISBN :
978-1-4673-6066-1
Type :
conf
DOI :
10.1109/IPDPS.2013.112
Filename :
6569864
Link To Document :
بازگشت