Title :
Improving yield and reliability of chip multiprocessors
Author :
Pan, Abhisek ; Khan, Omer ; Kundu, Sandip
Author_Institution :
Univ. of Massachusetts, Amherst, MA
Abstract :
An increasing number of hardware failures can be attributed to device reliability problems that cause partial system failure or shutdown. In this paper we propose a scheme for improving reliability of a homogeneous chip multiprocessor (CMP) that also serves to improve manufacturing yield. Our solution centers on exploiting the natural redundancy that already exists in multi-core systems by using services from other cores for functional units that are defective in a faulty core. A micro-architectural modification allows a core on a CMP to use another core as a coprocessor to service any instruction that the former cannot execute correctly. This service is accessed to improve yield and reliability, but at the cost of some loss of performance. In order to quantify this loss we have used a cycle-accurate simulator to simulate the performance of a dual-core system with one or two cores sustaining partial failure. Our results indicate that when a large and sparingly-used unit such as a floating point arithmetic unit fails in a core, even for a floating point intensive benchmark, we can continue to run each faulty core with help from companion cores with as little as 10% impact to performance and less than 1% area overhead.
Keywords :
floating point arithmetic; integrated circuit reliability; integrated circuit yield; microprocessor chips; multiprocessing systems; cycle-accurate simulator; device reliability problem; dual-core system; faulty core; floating point arithmetic unit; hardware failure; homogeneous chip multiprocessor; manufacturing yield; microarchitectural modification; multicore system; natural redundancy; partial system failure; partial system shutdown; Costs; Frequency; Hardware; Niobium compounds; Performance loss; Redundancy; Stress; Temperature; Titanium compounds; Voltage; micorarchitecture; multiprocessors; reliability; yield;
Conference_Titel :
Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE '09.
Conference_Location :
Nice
Print_ISBN :
978-1-4244-3781-8
DOI :
10.1109/DATE.2009.5090714