DocumentCode :
1664197
Title :
A 2.05GVertices/s 151mW lighting accelerator for 3D graphics vertex and pixel shading in 32nm CMOS
Author :
Sheikh, Farhana ; Mathew, Sanu ; Anders, Mark ; Kaul, Himanshu ; Hsu, Steven ; Agarwal, Amit ; Krishnamurthy, Ram ; Borkar, Shekhar
Author_Institution :
Intel, Hillsboro, OR, USA
fYear :
2012
Firstpage :
184
Lastpage :
186
Abstract :
Advanced lighting computation is the key ingredient for rendering realistic images in high-throughput 3D graphics pipelines. It is the most performance and power-critical operation in programmable vertex and pixel shaders due to the large number of complex floating-point (FP) multiplications and exponentiations [1]. Performance and energy-efficiency of geometry rendering can be significantly improved by hardware acceleration of lighting computations, which is leveraged by vertex/pixel shader programs residing in the memory of a programmable 3D graphics engine [2] (Fig. 10.4.1). A single-cycle throughput lighting accelerator targeted for on-die acceleration of 3D graphics vertex and pixel shading in high-performance processors and mobile SoCs is fabricated in 32nm high-k metal-gate CMOS [3] (Fig. 10.4.1). Ambient, diffuse, and specular components of the Phong Illumination (PI) equation [4] are computed in parallel in the log domain with 4-cycle latency and 560mV-to-1.2V operation. A high-accuracy 5-segment piecewise linear (PWL) approximation-based log circuit (FPWL-L) with low Hamming weight coefficients, a 32×32b signed truncated specular multiplier, and a high-precision 4-segment PWL approximation-based anti-log circuit (FPWL-AL) enable accurate fixed-point log-domain computation of PI lighting. Five FP multiplications and one FP exponentiation are transformed to five fixed-point additions and one fixed-point multiplication, respectively, resulting in single-cycle lighting throughput of 2.05GVertices/s (measured at 1.05V, 25°C) in a compact area of 0.064mm2 (Fig. 10.4.7) while achieving: (i) 47% reduction in critical path logic stages, (ii) 0.56% mean vertex lighting error compared to a single-precision FP computation, (iii) 354μW active leakage power measured at 1.05V, 25°C, (iv) scalable performance up to 2.22GHz, 232mW measured at 1.2V, and (Advanced lighting computation is the key ingredient for rendering realistic i- ages in high-throughput 3D graphics pipelines. It is the most performance and power-critical operation in programmable vertex and pixel shaders due to the large number of complex floating-point (FP) multiplications and exponentiations [1]. Performance and energy-efficiency of geometry rendering can be significantly improved by hardware acceleration of lighting computations, which is leveraged by vertex/pixel shader programs residing in the memory of a programmable 3D graphics engine [2] (Fig. 10.4.1). A single-cycle throughput lighting accelerator targeted for on-die acceleration of 3D graphics vertex and pixel shading in high-performance processors and mobile SoCs is fabricated in 32nm high-k metal-gate CMOS [3] (Fig. 10.4.1). Ambient, diffuse, and specular components of the Phong Illumination (PI) equation [4] are computed in parallel in the log domain with 4-cycle latency and 560mV-to-1.2V operation. A high-accuracy 5-segment piecewise linear (PWL) approximation-based log circuit (FPWL-L) with low Hamming weight coefficients, a 32×32b signed truncated specular multiplier, and a high-precision 4-segment PWL approximation-based anti-log circuit (FPWL-AL) enable accurate fixed-point log-domain computation of PI lighting. Five FP multiplications and one FP exponentiation are transformed to five fixed-point additions and one fixed-point multiplication, respectively, resulting in single-cycle lighting throughput of 2.05GVertices/s (measured at 1.05V, 25°C) in a compact area of 0.064mm2 (Fig. 10.4.7) while achieving: (i) 47% reduction in critical path logic stages, (ii) 0.56% mean vertex lighting error compared to a single-precision FP computation, (iii) 354μW active leakage power measured at 1.05V, 25°C, (iv) scalable performance up to 2.22GHz, 232mW measured at 1.2V, and (v) peak energy efficiency of 56GVertices/s/W, measured at 560mV, 25°C.v) peak energy efficiency of 56GVertices/s/W, measured at 560mV, 25°C
Keywords :
CMOS integrated circuits; image processing; nanofabrication; system-on-chip; 3D graphics engine; 3D graphics pipelines; 3D graphics vertex; Hamming weight coefficients; approximation-based anti-log circuit; frequency 2.22 GHz; geometry rendering; hardware acceleration; high-k metal-gate CMOS; high-performance processors; image rendering; mobile SoC; on-die acceleration; piecewise linear approximation-based log circuit; pixel shading; power 151 mW; power 232 mW; power 354 muW; single-cycle throughput lighting accelerator; size 32 nm; temperature 25 C; vertex/pixel shader programs; voltage 560 mV to 1.2 V; Energy efficiency; Energy measurement; Equations; Graphics; Lighting; Mathematical model; Three dimensional displays;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International
Conference_Location :
San Francisco, CA
ISSN :
0193-6530
Print_ISBN :
978-1-4673-0376-7
Type :
conf
DOI :
10.1109/ISSCC.2012.6176967
Filename :
6176967
Link To Document :
بازگشت