مرکز منطقه ای اطلاع رساني علوم و فناوري - Concurrency, latency, or system overhead: Which has the largest impact on uniprocessor DRAM-system performance?

DocumentCode :

3295939

Title :

Concurrency, latency, or system overhead: Which has the largest impact on uniprocessor DRAM-system performance?

Author :

Cuppu, Vinodh ; Jacob, Bruce

Author_Institution :

Dept. of Electr. & Comput. Eng., Maryland Univ., College Park, MD, USA

fYear :

2001

fDate :

2001

Firstpage :

Lastpage :

Abstract :

Given a fixed CPU architecture and a fixed DRAM timing specification, there is still a large design space for a DRAM system organization. Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protocol, algorithms for assigning request priorities and scheduling requests dynamically, etc. In this design space, we see a wide variation in application execution times; for example, execution times for SPEC CPU 2000 integer suite on a 2-way ganged direct rambles organization (32 data bits) with 64-byte bursts are 10-20% lower than execution times on an otherwise identical configuration that uses 32-byte bursts. This represents two system configurations that are relatively close to each other in the design space; performance differences become even more pronounced for designs further apart. This paper characterizes the sources of overhead in high-performance DRAM systems and investigates the most effective ways to reduce a system´s exposure to performance loss. In particular, we look at mechanisms to increase a system´s support for concurrent transactions, mechanisms to reduce request latency, and mechanisms to reduce the “system overhead”-the portion of the primary memory system´s overhead that is not due to DRAM latency but rather to things like turnaround time, request queueing inefficiencies due to read/write request interleaving, etc. Our simulator models a 2 GHz, highly aggressive out-of-order uniprocessor. The interface to the memory system is fully non-blocking, supporting up to 32 outstanding misses at both the level-1 and level-2 caches and split-transaction busses to all DRAM banks

Keywords :

parallel architectures; performance evaluation; processor scheduling; protocols; timing; SPEC CPU 2000 integer suite; application execution times; burst sizes; concurrency; fixed CPU architecture; fixed DRAM timing specification; ganged direct rambles organization; high-performance DRAM systems; latency; memory-controller page protocol; organizations; queue sizes; system overhead; uniprocessor DRAM-system performance; Bandwidth; Concurrent computing; Delay; Dynamic scheduling; Performance loss; Protocols; Random access memory; Read-write memory; Scheduling algorithm; Timing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Architecture, 2001. Proceedings. 28th Annual International Symposium on

Conference_Location :

Goteborg

ISSN :

1063-6897

Print_ISBN :

0-7695-1162-7

Type :

conf

DOI :

10.1109/ISCA.2001.937433

Filename :

937433

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3295939