مرکز منطقه ای اطلاع رساني علوم و فناوري - A 2.05GVertices/s 151mW lighting accelerator for 3D graphics vertex and pixel shading in 32nm CMOS

Abstract :

Advanced lighting computation is the key ingredient for rendering realistic images in high-throughput 3D graphics pipelines. It is the most performance and power-critical operation in programmable vertex and pixel shaders due to the large number of complex floating-point (FP) multiplications and exponentiations [1]. Performance and energy-efficiency of geometry rendering can be significantly improved by hardware acceleration of lighting computations, which is leveraged by vertex/pixel shader programs residing in the memory of a programmable 3D graphics engine [2] (Fig. 10.4.1). A single-cycle throughput lighting accelerator targeted for on-die acceleration of 3D graphics vertex and pixel shading in high-performance processors and mobile SoCs is fabricated in 32nm high-k metal-gate CMOS [3] (Fig. 10.4.1). Ambient, diffuse, and specular components of the Phong Illumination (PI) equation [4] are computed in parallel in the log domain with 4-cycle latency and 560mV-to-1.2V operation. A high-accuracy 5-segment piecewise linear (PWL) approximation-based log circuit (FPWL-L) with low Hamming weight coefficients, a 32×32b signed truncated specular multiplier, and a high-precision 4-segment PWL approximation-based anti-log circuit (FPWL-AL) enable accurate fixed-point log-domain computation of PI lighting. Five FP multiplications and one FP exponentiation are transformed to five fixed-point additions and one fixed-point multiplication, respectively, resulting in single-cycle lighting throughput of 2.05GVertices/s (measured at 1.05V, 25°C) in a compact area of 0.064mm² (Fig. 10.4.7) while achieving: (i) 47% reduction in critical path logic stages, (ii) 0.56% mean vertex lighting error compared to a single-precision FP computation, (iii) 354μW active leakage power measured at 1.05V, 25°C, (iv) scalable performance up to 2.22GHz, 232mW measured at 1.2V, and (Advanced lighting computation is the key ingredient for rendering realistic i- ages in high-throughput 3D graphics pipelines. It is the most performance and power-critical operation in programmable vertex and pixel shaders due to the large number of complex floating-point (FP) multiplications and exponentiations [1]. Performance and energy-efficiency of geometry rendering can be significantly improved by hardware acceleration of lighting computations, which is leveraged by vertex/pixel shader programs residing in the memory of a programmable 3D graphics engine [2] (Fig. 10.4.1). A single-cycle throughput lighting accelerator targeted for on-die acceleration of 3D graphics vertex and pixel shading in high-performance processors and mobile SoCs is fabricated in 32nm high-k metal-gate CMOS [3] (Fig. 10.4.1). Ambient, diffuse, and specular components of the Phong Illumination (PI) equation [4] are computed in parallel in the log domain with 4-cycle latency and 560mV-to-1.2V operation. A high-accuracy 5-segment piecewise linear (PWL) approximation-based log circuit (FPWL-L) with low Hamming weight coefficients, a 32×32b signed truncated specular multiplier, and a high-precision 4-segment PWL approximation-based anti-log circuit (FPWL-AL) enable accurate fixed-point log-domain computation of PI lighting. Five FP multiplications and one FP exponentiation are transformed to five fixed-point additions and one fixed-point multiplication, respectively, resulting in single-cycle lighting throughput of 2.05GVertices/s (measured at 1.05V, 25°C) in a compact area of 0.064mm² (Fig. 10.4.7) while achieving: (i) 47% reduction in critical path logic stages, (ii) 0.56% mean vertex lighting error compared to a single-precision FP computation, (iii) 354μW active leakage power measured at 1.05V, 25°C, (iv) scalable performance up to 2.22GHz, 232mW measured at 1.2V, and (v) peak energy efficiency of 56GVertices/s/W, measured at 560mV, 25°C.v) peak energy efficiency of 56GVertices/s/W, measured at 560mV, 25°C