Author_Institution :
Package & Assembly Eng., Intersil Corp., Milpitas, CA, USA
Abstract :
Current generations of high performance microprocessors feature multiple cores and micro-cores, with each supporting multiple threads implemented in hardware. Such designs routinely feature billions of transistors, and chip layout teams are frequently hard pressed for placement and routing of all the functional blocks and sub-blocks that go into the design. An additional complexity arises because system engineers would like to have each micro-cores temperature monitored for silicon reliability and system performance reasons, which translates into them requiring that each core preferably be outfitted with a thermal sensor that routed out to the external world. Since die real estate is already at a premium and sensor macros can often be large, CPU design teams frequently shy away from placing and routing one sensor per each micro-core. The practical implication of this is that there is no means to monitor how hot any given micro-core is getting during field operation - which can compound risk significantly from the standpoints of silicon reliability (GoX, TDDB), chip electrical performance (timing, clock skew, jitter) and system performance (real time benchmarks, field performance, data coherency etc). In this study, a multi-core processor chip with a wide range of core-to-core power variability is considered. A finite number of sensor locations, which are known to be thermally sub-optimal, are assumed to be available for placement and routing. Using sensory data from these “poor” locations and an offline training algorithm, temperatures of all key core locations are determined using a causal, linear least-squares error basis. The resulting formulation is tested for prediction integrity using a large sample Monte Carlo analysis, and the temperature predictions are found to be robust. The technique developed is general enough to be applied across any microprocessor product family. The study concludes with suggested techniques to maintain prediction robu stness in the presence of measurement errors, diode part-to-part variation and other inaccuracies. The approach proposed here can circumvent the limitations on placing and routing multiple diodes in real-estate constrained multi-core microprocessor and ASIC applications.
Keywords :
Monte Carlo methods; application specific integrated circuits; clocks; elemental semiconductors; least squares approximations; microprocessor chips; multiprocessing systems; semiconductor device reliability; silicon; temperature measurement; temperature sensors; timing jitter; ASIC applications; CPU design; Monte Carlo analysis; chip electrical performance; chip layout; clock skew; jitter; least-squares error; measurement errors; microcores temperature monitoring; microprocessors; multicore chips; multiple diodes; sensory data; silicon reliability; temperature predictions; thermal sensor; timing; Histograms; Microprocessors; Multicore processing; Temperature distribution; Temperature measurement; Temperature sensors; CPU; Multi-core; diodes; least-squares; monitoring; prediction; temperature;