Title :
The impact of negative acknowledgments in shared memory scientific applications
Author :
Chaudhuri, Mainak ; Heinrich, Mark
Author_Institution :
Comput. Syst. Lab., Cornell Univ., Ithaca, NY, USA
fDate :
2/1/2004 12:00:00 AM
Abstract :
Negative acknowledgments (NACKs) and subsequent retries, used to resolve races and to enforce a total order among shared memory accesses in distributed shared memory (DSM) multiprocessors, not only introduce extra network traffic and contention, but also increase node controller occupancy, especially at the home. We present possible protocol optimizations to minimize these retries and offer a thorough study of the performance effects of these messages on six scalable scientific applications running on 64-node systems and larger. To eliminate NACKs, we present a mechanism to queue pending requests at the main memory of the home node and augment it with a novel technique of combining pending read requests, thereby accelerating the parallel execution for 64 nodes by as much as 41 percent (a speedup of 1.41) compared to a modified version of the SGI Origin 2000 protocol. We further design and evaluate a protocol by combining this mechanism with a technique that we call write string forwarding, used in the AlphaServer GS320 and Piranha systems. We find that without careful design considerations, especially regarding atomic read-modify-write operations, this aggressive write forwarding can hurt performance. We identify and evaluate the necessary micro-architectural support to solve this problem. We compare the performance of these novel NACK-free protocols with a base bitvector protocol, a modified version of the SGI Origin 2000 protocol, and a NACK-free protocol that uses dirty sharing and write string forwarding as in the Piranha system. To understand the effects of network speed and topology the evaluation is carried out on three network configurations.
Keywords :
cache storage; distributed shared memory systems; minimisation; natural sciences computing; parallel processing; protocols; AlphaServer GS320; NACK-free protocol optimization; Piranha systems; SGI Origin 2000 protocol; atomic read-modify-write operations; base bitvector protocol; cache coherence protocol; dirty sharing; distributed shared memory multiprocessors; negative acknowledgments; network speed; network topology; node controller occupancy; parallel execution; queue pending requests; write string forwarding; Acceleration; Access protocols; Communication system traffic control; Intelligent networks; Large-scale systems; Network topology; Performance analysis; Resource management; Samarium; Traffic control;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2004.1264797