DocumentCode
2819068
Title
A multithreaded processor designed for distributed shared memory systems
Author
Grünewald, Winfried ; Ungerer, Theo
Author_Institution
Dept. of Comput. Design & Fault Tolerance, Karlsruhe Univ., Germany
fYear
1997
fDate
19-21 Mar 1997
Firstpage
206
Lastpage
213
Abstract
The multithreaded processor-called Rhamma-uses a fast context switch to bridge latencies caused by memory accesses or by synchronization operations. Load/store, synchronization, and execution operations of different threads of control are executed simultaneously by appropriate functional units. A fast context switch is performed whenever a functional unit comes across an operation that is destined for another unit. The overall performance depends on the speed of the context switch. We present two techniques to reduce the context switch cost to at most one processor cycle: A context switch is explicitly coded in the opcode, and a context switch buffer is used. The load/store unit shows up as the principal bottleneck. We evaluate four implementation alternatives of the load/store unit to increase processor performance
Keywords
distributed memory systems; performance evaluation; shared memory systems; synchronisation; Rhamma; distributed shared memory systems; execution operations; fast context switch; latencies; memory accesses; multithreaded processor; opcode; performance; processor performance; synchronization; synchronization operations; Bridges; Context; Laboratories; Microcomputers; Microprocessors; Multiprocessing systems; Process design; Standards development; Switches; Yarn;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Parallel and Distributed Computing, 1997. Proceedings
Conference_Location
Shanghai
Print_ISBN
0-8186-7876-3
Type
conf
DOI
10.1109/APDC.1997.574034
Filename
574034
Link To Document