Title of article :
Explicit parallel block Cholesky algorithms on the CRAY APP Original Research Article
Author/Authors :
Margreet Nool، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 1995
Pages :
24
From page :
91
To page :
114
Abstract :
In this paper we consider the CRAY APP, the Attached Parallel Processor of the CRAY S-MP, which consists of seven buses with each bus supporting up to 12 processing elements. Processing elements on different buses can communicate simultaneously with the shared main memory, but processing elements sharing the same bus can not, since only one processing element per bus can access memory at a given time. Applications with a high level of data reuse, or, with a high computation intensity, and applications being highly parallel are very suitable to run on the APP. An example of such an algorithm is matrix-matrix multiplication. We illustrate how the data trafficʹs restriction influences the performance and we discuss a performance model of the bus architecture, considering a change in processor speed, data traffic speed and cache contents. Furthermore, two different algorithms for Cholesky factorization are discussed: a block left-looking algorithm and a block right-looking algorithm. The maximum achievable speed on the CRAY APP is mainly determined by the performance of the matrix-matrix multiplication. Parallelism is applied explicitly over the blocks, which makes it possible to concatenate different block operations in cache. The results obtained on CWIʹs APP (a machine having twenty-eight processing elements) indicate how block algorithms can be parallelized on machines with hundreds or thousands of processors.
Journal title :
Applied Numerical Mathematics
Serial Year :
1995
Journal title :
Applied Numerical Mathematics
Record number :
941936
Link To Document :
بازگشت