Title :
McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures
Author :
Li, Sheng ; Ahn, Jung Ho ; Strong, Richard D. ; Brockman, Jay B. ; Tullsen, Dean M. ; Jouppi, Norman P.
Author_Institution :
Univ. of Notre Dame, Notre Dame, IN, USA
Abstract :
This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90 nm to 22 nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22 nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.
Keywords :
XML; cache storage; computer architecture; logic design; microprocessor chips; performance evaluation; ITRS roadmap; McPAT; PARSEC benchmark simulation; SOI; XML interface; bulk CMOS; cache sharing; chip multiprocessor; critical-path timing modeling; double-gate transistors; energy-delay-area product; in-order processor cores; integrated memory controllers; leakage power modeling; manycore architectures; manycore processor configurations; microarchitectural level; multicore architectures; multiple-domain clocking; networks-on-chip; out-of-order processor cores; performance simulator; shared caches; size 90 nm to 22 nm; Costs; Electronic design automation and methodology; Integrated circuit interconnections; Microarchitecture; Multicore processing; Out of order; Predictive models; Semiconductor device modeling; Space exploration; Timing; Performance; Verification;
Conference_Titel :
Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on
Conference_Location :
New York, NY
Print_ISBN :
978-1-60558-798-1