مرکز منطقه ای اطلاع رساني علوم و فناوري - Architecture and performance of the Hitachi SR2201 massively parallel processor system

DocumentCode :

3414561

Title :

Architecture and performance of the Hitachi SR2201 massively parallel processor system

Author :

Fujii, Hiroaki ; Yasuda, Yoshiko ; Akashi, Hideya ; Inagami, Yasuhiro ; Koga, Makoto ; Ishihara, Osamu ; Kashiyama, Masamori ; Wada, Hideo ; Sumimoto, Tsutomu

Author_Institution :

Central Res. Lab., Hitachi Ltd., Kokubunji, Japan

fYear :

1997

fDate :

1-5 Apr 1997

Firstpage :

233

Lastpage :

241

Abstract :

RISC-based Massively Parallel Processors (MPPs) often show low efficiency in real-world applications because of cache miss penalty, insufficient throughput of the memory system, and poor inter-processor communication performance. Hitachi´s SR2201, an MPP scalable up to 2048 processors and 600 GFLOPS peak performance, overcomes these problems by introducing three novel features. First, its processor the 150 MHz HARP-IE, solves the cache miss penalty by “pseudo vector processing” (PVP). In PVP, data is loaded by prefetching to a special register bank, bypassing the cache. Second, a multi-bank memory architecture that operates like a pipeline eliminates the memory system bottleneck. Third, the inter-processor communication achieves high performance on the three-dimensional crossbar network, using a “remote DMA transfer” protocol and a hardware-based cache coherency. As the result of these improvements, the SR2201 achieved 220.4 GFLOPS with 1024 processors in the LINPACK benchmark, which is almost 72% of the peak performance

Keywords :

parallel architectures; parallel processing; performance evaluation; reduced instruction set computing; 150 MHz HARP-IE; Hitachi SR2201 massively parallel processor system; LINPACK benchmark; RISC-based processors; cache miss penalty; hardware-based cache coherency; inter-processor communication performance; memory system bottleneck; multi-bank memory architecture; protocol; Application software; Cache memory; Computer architecture; Concurrent computing; Degradation; Laboratories; Prefetching; Reduced instruction set computing; Software performance; Throughput;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel Processing Symposium, 1997. Proceedings., 11th International

Conference_Location :

Genva

ISSN :

1063-7133

Print_ISBN :

0-8186-7793-7

Type :

conf

DOI :

10.1109/IPPS.1997.580901

Filename :

580901

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3414561