DocumentCode :
506165
Title :
Parallel algorithms for super performance
Author :
Shakshober, D. John
Author_Institution :
Digital Equipment Corporation, BXB2-2/G08, 60 Codman Hill Road, Boxboro, Ma
fYear :
1989
fDate :
12-17 Nov. 1989
Firstpage :
380
Lastpage :
388
Abstract :
This paper describes the development of parallel algorithms on M31, a large-scale, shared memory multiprocessor VAX computer. Matrix operations have been optimized for a subset of the BLAS, the Basic Linear Algebra Subroutines. Efficient image processing algorithms were also developed for parallel Convolution, Correlation, and Fast Fourier Transforms (non-synchronizing one and two dimensional FFTs). The effect of matrix partitioning was examined using two different memory allocation strategies. We found that contiguous memory partitioning can yield performance gains beyond the linear expectation. Super performance was achieved through a parallel algorithm devised to minimize cache-replacements. Fewer replacements allowed high CPU utilization with minimal system overhead. Inefficient matrix partitioning tended to stifle parallel performance because frequent cache misses created heavy bus traffic and thus increased system overhead.
Keywords :
Concurrent computing; Convolution; Fast Fourier transforms; Flexible printed circuits; Image processing; Large-scale systems; Linear algebra; Parallel algorithms; Partitioning algorithms; Performance gain;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Supercomputing, 1989. Supercomputing '89. Proceedings of the 1989 ACM/IEEE Conference on
Conference_Location :
Reno, NV, United States
Print_ISBN :
0-89791-341-8
Type :
conf
DOI :
10.1145/76263.76305
Filename :
5349000
Link To Document :
بازگشت