DocumentCode :
3295939
Title :
Concurrency, latency, or system overhead: Which has the largest impact on uniprocessor DRAM-system performance?
Author :
Cuppu, Vinodh ; Jacob, Bruce
Author_Institution :
Dept. of Electr. & Comput. Eng., Maryland Univ., College Park, MD, USA
fYear :
2001
fDate :
2001
Firstpage :
62
Lastpage :
71
Abstract :
Given a fixed CPU architecture and a fixed DRAM timing specification, there is still a large design space for a DRAM system organization. Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protocol, algorithms for assigning request priorities and scheduling requests dynamically, etc. In this design space, we see a wide variation in application execution times; for example, execution times for SPEC CPU 2000 integer suite on a 2-way ganged direct rambles organization (32 data bits) with 64-byte bursts are 10-20% lower than execution times on an otherwise identical configuration that uses 32-byte bursts. This represents two system configurations that are relatively close to each other in the design space; performance differences become even more pronounced for designs further apart. This paper characterizes the sources of overhead in high-performance DRAM systems and investigates the most effective ways to reduce a system´s exposure to performance loss. In particular, we look at mechanisms to increase a system´s support for concurrent transactions, mechanisms to reduce request latency, and mechanisms to reduce the “system overhead”-the portion of the primary memory system´s overhead that is not due to DRAM latency but rather to things like turnaround time, request queueing inefficiencies due to read/write request interleaving, etc. Our simulator models a 2 GHz, highly aggressive out-of-order uniprocessor. The interface to the memory system is fully non-blocking, supporting up to 32 outstanding misses at both the level-1 and level-2 caches and split-transaction busses to all DRAM banks
Keywords :
parallel architectures; performance evaluation; processor scheduling; protocols; timing; SPEC CPU 2000 integer suite; application execution times; burst sizes; concurrency; fixed CPU architecture; fixed DRAM timing specification; ganged direct rambles organization; high-performance DRAM systems; latency; memory-controller page protocol; organizations; queue sizes; system overhead; uniprocessor DRAM-system performance; Bandwidth; Concurrent computing; Delay; Dynamic scheduling; Performance loss; Protocols; Random access memory; Read-write memory; Scheduling algorithm; Timing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture, 2001. Proceedings. 28th Annual International Symposium on
Conference_Location :
Goteborg
ISSN :
1063-6897
Print_ISBN :
0-7695-1162-7
Type :
conf
DOI :
10.1109/ISCA.2001.937433
Filename :
937433
Link To Document :
بازگشت