Title :
Datarol: a parallel machine architecture for fine-grain multithreading
Author :
Amamiya, Makoto ; Tomiyasu, Hiroshi ; Kusakabe, Shigeru
Author_Institution :
Dept. of Intelligent Syst., Kyushu Univ., Fukuoka, Japan
Abstract :
We discuss a design principle of massively parallel distributed-memory multiprocessor architecture which solves latency problem, and present the Datarol machine architecture. Latencies, caused by remote memory access and remote procedure call, are most serious problems in massively parallel computers. In order to eliminate the processor idle times caused by these latencies, processors must perform fast context switching among fine-grain concurrent processes. First, we present a processor architecture, called Datarol-II, that promotes efficient fine-grain multithread execution by performing fast context switching among fine-grain concurrent processes. In the Datarol-II processor, an implicit register load/store mechanism is embedded in the execution pipeline in order to reduce memory access overhead caused by context switching. In order to reduce local memory access latency, a two-level hierarchical memory system and a load control mechanism are also introduced. Then, we present a cost-effective design of the Datarol-II processor, which incorporates off-the-shelf high-end microprocessor while preserving the fine-grain dataflow concept. The off-the-shelf microprocessor Pentium is used for its core processing, and a co-processor called FMP (Fine-grain Message Processor) is designed for fine grained message handling and communication controls. The co-processor FMP is designed on the basis of FMD (Fine-grain Message Driven) execution model, in which fine-grain multi-threaded execution is driven and controlled by simple fine-grain message communications
Keywords :
distributed memory systems; parallel architectures; parallel programming; Datarol; Datarol-II; communication controls; context switching; design principle; fine grained message handling; fine-grain multithreading; high-end microprocessor; latency problem; local memory access latency; massively parallel computers; massively parallel distributed-memory multiprocessor architecture; parallel machine architecture; processor architecture; processor idle times; remote memory access; remote procedure call; Communication system control; Computer architecture; Concurrent computing; Coprocessors; Delay; Load flow control; Microprocessors; Parallel machines; Pipelines; Registers;
Conference_Titel :
Massively Parallel Programming Models, 1997. Proceedings. Third Working Conference on
Conference_Location :
London
Print_ISBN :
0-8186-8427-5
DOI :
10.1109/MPPM.1997.715971