DocumentCode
1659118
Title
Active memory techniques for ccNUMA multiprocessors
Author
Kim, Daehyun ; Chaudhuri, Mainak ; Heinrich, Mark
Author_Institution
Comput. Syst. Lab., Cornell Univ., Ithaca, NY, USA
fYear
2003
Abstract
Our recent work on uniprocessor and single-node multiprocessor (SMP) active memory systems uses address remapping techniques in conjunction with extended cache coherence protocols to improve access locality in processor caches. We extend our previous work in this paper and introduce the novel concept of multi-node active memory systems. We present the design of multi-node active memory cache coherence protocols to help reduce remote memory latency and improve scalability of matrix transpose and parallel reduction on distributed shared memory (DSM) multiprocessors. We evaluate our design on seven applications through execution-driven simulation on small and medium-scale multiprocessors. On a 32-processor system, an active-memory optimized matrix transpose attains speedup from 1.53 to 2.01 while parallel reduction achieves speedup from 1.19 to 2.81 over normal parallel executions.
Keywords
cache storage; delays; distributed shared memory systems; matrix algebra; parallel programming; performance evaluation; protocols; DSM multiprocessors; cache coherence protocols; ccNUMA multiprocessors; distributed shared memory; execution-driven simulation; matrix transpose; multi-node active memory systems; parallel reduction; remote memory latency; scalability; speedup; Access protocols; Computer architecture; Control systems; Delay; Hardware; Laboratories; Network interfaces; Prefetching; Scalability; Scattering;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2003. Proceedings. International
ISSN
1530-2075
Print_ISBN
0-7695-1926-1
Type
conf
DOI
10.1109/IPDPS.2003.1213085
Filename
1213085
Link To Document