Author_Institution :
Adv. Micro Devices, Inc., Austin, TX, USA
Abstract :
With increasing data rate and power density, high-performance memories have started to require dynamic thermal management (DTM), following the trend of processor and hard drive. There are also lack of a memory thermal model and simulation tools to facilitate the research of memory DTM. This study investigates the approach of coordinating processor, which is the source of memory access requests, and memory to improve system performance and/or power efficiency during memory thermal emergency. Two such schemes, namely adaptive core gating (DTM-ACG) and coordinated DVFS (DTM-CDVFS), are proposed and evaluated on a real server platform. DTM-ACG gates processor cores and DTM-CDVFS scales down the frequency and voltage level of processor cores according to memory thermal emergency level. Their combination, namely DTM-COMB, is also evaluated. The experimental results show that the two schemes, while successfully controlling memory activities and handling thermal emergencies, improve performance significantly under the given thermal envelope. The measurement results from an Intel SR1500AL server testbed show that on average, DTM-ACG and DTM-CDVFS improve performance by 6.7 and 15.3 percent, respectively, over a prior memory bandwidth throttling scheme. DTM-CDVFS also reduces the processor power rate by 15.5 percent and system (including processor and memory) energy by 22.7 percent. Additionally, we propose a DRAM thermal model and validate it with measurement on the instrumented server platform. We find that our proposed model faithfully catches the dynamic DRAM temperature changes; the average difference between the modeled and measured temperature is less than 1°C.
Keywords :
DRAM chips; microprocessor chips; thermal management (packaging); DRAM system; DRAM thermal model; DTM-ACG gates processor core; DTM-CDVFS; DTM-COMB; Intel SR1500AL server testbed; adaptive core gating; coordinated DVFS; dynamic thermal management; hard drive; high-performance memory; instrumented server platform; memory DTM; memory bandwidth throttling scheme; memory thermal emergency level; memory thermal model; power efficiency; simulation tool; thermal emergency handling; thermal envelope; thermal modeling; Bandwidth; Random access memory; Servers; Temperature measurement; Temperature sensors; Thermal management; DRAM system; Multicore; performance; power; thermal modeling;