Title :
Parallel algorithms for super performance
Author :
Shakshober, D. John
Author_Institution :
Digital Equipment Corporation, BXB2-2/G08, 60 Codman Hill Road, Boxboro, Ma
Abstract :
This paper describes the development of parallel algorithms on M31, a large-scale, shared memory multiprocessor VAX computer. Matrix operations have been optimized for a subset of the BLAS, the Basic Linear Algebra Subroutines. Efficient image processing algorithms were also developed for parallel Convolution, Correlation, and Fast Fourier Transforms (non-synchronizing one and two dimensional FFTs). The effect of matrix partitioning was examined using two different memory allocation strategies. We found that contiguous memory partitioning can yield performance gains beyond the linear expectation. Super performance was achieved through a parallel algorithm devised to minimize cache-replacements. Fewer replacements allowed high CPU utilization with minimal system overhead. Inefficient matrix partitioning tended to stifle parallel performance because frequent cache misses created heavy bus traffic and thus increased system overhead.
Keywords :
Concurrent computing; Convolution; Fast Fourier transforms; Flexible printed circuits; Image processing; Large-scale systems; Linear algebra; Parallel algorithms; Partitioning algorithms; Performance gain;
Conference_Titel :
Supercomputing, 1989. Supercomputing '89. Proceedings of the 1989 ACM/IEEE Conference on
Conference_Location :
Reno, NV, United States
Print_ISBN :
0-89791-341-8
DOI :
10.1145/76263.76305