Title :
Efficient algorithms for block-cyclic redistribution of arrays
Author :
Lim, Young Won ; Bhat, Prashanth B. ; Prasanna, Viktor K.
Author_Institution :
Dept. of Electr. Eng. Syst., Univ. of Southern California, Los Angeles, CA, USA
Abstract :
We present new algorithmic techniques for a classical research problem, runtime redistribution of an array from one block-cyclic layout to another. Our methodology for reducing communication overheads is based on a generalized circulant matrix formalism. Using this formalism, we derive direct, indirect, and hybrid communication schedules for the cyclic redistribution problem when the block size changes by an integer factor K. We have also developed formulae to estimate the timing performance of each of these schedules for a given parallel machine and redistribution problem. In our indirect communication schedule, blocks are moved from a source processor to a destination processor through intermediate “relay” processors. This reduces the number of communication steps by an order of magnitude, in comparison with previous approaches. This algorithm performs cyclic(x) to cyclic(Kx) redistribution on P processors in [log2K]+2 steps. Implementations of these algorithms on the Cray T3D and on the IBM SP-2 show superior performance over previous approaches. Since our algorithms are developed using MPI, they can be easily ported to different application environments. Our techniques can be used in the design of scalable redistribution libraries, in efficient implementations of the REDISTRIBUTE directive of HPF and in developing parallel algorithms for various HPC applications
Keywords :
Cray computers; IBM computers; data structures; message passing; parallel algorithms; parallel machines; scheduling; software performance evaluation; software portability; Cray T3D; HPF; IBM SP-2; MPI; algorithmic techniques; block size; block-cyclic layout; block-cyclic redistribution of arrays; communication overheads; destination processor; direct communication; generalized circulant matrix; hybrid communication; indirect communication; indirect communication schedule; parallel machine; performance; runtime array redistribution; scalable redistribution libraries; software portability; source processor; timing performance; Algorithm design and analysis; High performance computing; Libraries; Parallel algorithms; Parallel machines; Processor scheduling; Program processors; Programming profession; Search problems; Signal processing algorithms;
Conference_Titel :
Parallel and Distributed Processing, 1996., Eighth IEEE Symposium on
Conference_Location :
New Orleans, LA
Print_ISBN :
0-8186-7683-3
DOI :
10.1109/SPDP.1996.570319