DocumentCode :
2061422
Title :
4.2 A 20nm 32-Core 64MB L3 cache SPARC M7 processor
Author :
Li, Penny ; Shin, Jinuk Luke ; Konstadinidis, Georgios ; Schumacher, Francis ; Krishnaswamy, Venkat ; Hoyeol Cho ; Dash, Sudesna ; Masleid, Robert ; Chaoyang Zheng ; Lin, Yuanjung David ; Loewenstein, Paul ; Heechoul Park ; Srinivasan, Vijay ; Dawei Huang
Author_Institution :
Oracle, Redwood Shores, CA, USA
fYear :
2015
fDate :
22-26 Feb. 2015
Firstpage :
1
Lastpage :
3
Abstract :
The SPARC M7 processor delivers more than 3x throughput performance improvement over its predecessor SPARC M6 for commercial applications. It introduces new design features, such as the S4 core, a 64MB L3 cache subsystem with application data integrity, a low-latency, high-throughput on-chip network (OCN), a database analytic accelerator (DAX), fine-grain adaptive power management and 1.5× higher SerDes I/O bandwidth for memory, coherency and system interfaces (Fig. 4.2.1) [1]. The enhancements in the S4 core over the S3 core [2] include a new L2 cache scheme, support for visual instruction set (VIS) extensions, virtual address masking and user-level synchronization instructions to provide continuous single-thread performance improvement for SPARC processors since SPARC T4. In addition, a hierarchical modular approach, called SPARC cache cluster (SCC), is used for the core-L2-L3 cache system. Within the SCC, all four cores share a single 256KB L2 instruction cache and each core pair has its own 256KB L2 data cache. The L2 caches are organized as 2-banks and 8-ways to deliver greater than 1TB/s bandwidth to the four cores. This L2 system delivers 2× more throughput for each core with 1.5x increase in size and the same latency as the previous generation L2 cache scheme. The L2 caches connect to an 8MB, 8-way set-associative partitioned L3 cache. Having a localized L3 cache within each SCC reduces L3 latency by 25%. The chip contains eight SCCs for a total of 32-cores with 256 threads and a 64MB L3 cache with 1.6TB/S bandwidth. In order to support the bandwidth and latency requirements from 256 threads and other system agents, the OCN architecture is implemented in place of a crossbar based network used in previous SPARC processors. Each SCC connects to the OCN, which in turn connects to four on-chip memory controllers (MCUs), coherency systems and eight database analytic accelerator (DAX) engines. The SPARC M7 introduces a customized DAX engine in an - ffort to optimize performance for Oracle databases. Eight DAX engines handle simple query predicates, decompression, message passing and interrupts across cluster nodes. This query accelerator provides up to 10x better performance for single stream decompression.
Keywords :
cache storage; data integrity; microprocessor chips; multiprocessing systems; 32-core 64MB L3 cache SPARC M7 processor; 8-way set-associative partitioned L3 cache; DAX; L2 cache scheme; L2 data cache; L2 instruction cache; MCU; OCN; OCN architecture; Oracle databases; S4 core; SCC; SPARC M6; SPARC cache cluster; SerDes IO bandwidth; VIS; application data integrity; commercial applications; continuous single-thread performance improvement; database analytic accelerator; fine-grain adaptive power management; four on-chip memory controllers; high-throughput on-chip network; single stream decompression; throughput performance improvement; visual instruction set extensions; Bandwidth; Clocks; Engines; Network topology; Program processors; System-on-chip; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Solid- State Circuits Conference - (ISSCC), 2015 IEEE International
Conference_Location :
San Francisco, CA
Print_ISBN :
978-1-4799-6223-5
Type :
conf
DOI :
10.1109/ISSCC.2015.7062931
Filename :
7062931
Link To Document :
بازگشت