Title : 
The Design Space of Data-Parallel Memory Systems
         
        
            Author : 
Ahn, Jung Ho ; Erez, Mattan ; Dally, William J.
         
        
            Author_Institution : 
Comput. Syst. Lab., Stanford Univ., CA
         
        
        
        
        
        
            Abstract : 
Data-parallel memory systems must maintain a large number of outstanding memory references to fully use increasing DRAM bandwidth in the presence of rising latencies. Additionally, throughput is increasingly sensitive to the reference patterns due to the rising latency of issuing DRAM commands, switching between reads and writes, and precharging/activating internal DRAM banks. We study the design space of data-parallel memory systems in light of these trends of increasing concurrency, latency, and sensitivity to access patterns. We perform a detailed performance analysis of scientific and multimedia applications and micro-benchmarks, varying DRAM parameters and the memory-system configuration. We identify the interference between concurrent read and write memory-access threads, and bank conflicts, both within a single thread and across multiple threads, as the most critical factors affecting performance. We then develop hardware techniques to minimize throughput degradation. We advocate either relying on multiple concurrent accesses from a single memory-reference thread only, while sacrificing load-balance, or introducing new hardware to maintain both locality of reference and load-balance between multiple DRAM channels with multiple threads. We show that a low-cost configuration with only 16 channel-buffer entries achieves over 80% of peak throughput in most cases
         
        
            Keywords : 
DRAM chips; multi-threading; parallel architectures; parallel memories; resource allocation; DRAM bandwidth; data-parallel memory system; load-balancing; memory-access thread; memory-reference thread; microbenchmark; multimedia application; scientific application; Bandwidth; Concurrent computing; Degradation; Delay; Hardware; Interference; Performance analysis; Random access memory; Throughput; Yarn;
         
        
        
        
            Conference_Titel : 
SC 2006 Conference, Proceedings of the ACM/IEEE
         
        
            Conference_Location : 
Tampa, FL
         
        
            Print_ISBN : 
0-7695-2700-0
         
        
            Electronic_ISBN : 
0-7695-2700-0