DocumentCode :
2173008
Title :
Improving yield and reliability of chip multiprocessors
Author :
Pan, Abhisek ; Khan, Omer ; Kundu, Sandip
Author_Institution :
Univ. of Massachusetts, Amherst, MA
fYear :
2009
fDate :
20-24 April 2009
Firstpage :
490
Lastpage :
495
Abstract :
An increasing number of hardware failures can be attributed to device reliability problems that cause partial system failure or shutdown. In this paper we propose a scheme for improving reliability of a homogeneous chip multiprocessor (CMP) that also serves to improve manufacturing yield. Our solution centers on exploiting the natural redundancy that already exists in multi-core systems by using services from other cores for functional units that are defective in a faulty core. A micro-architectural modification allows a core on a CMP to use another core as a coprocessor to service any instruction that the former cannot execute correctly. This service is accessed to improve yield and reliability, but at the cost of some loss of performance. In order to quantify this loss we have used a cycle-accurate simulator to simulate the performance of a dual-core system with one or two cores sustaining partial failure. Our results indicate that when a large and sparingly-used unit such as a floating point arithmetic unit fails in a core, even for a floating point intensive benchmark, we can continue to run each faulty core with help from companion cores with as little as 10% impact to performance and less than 1% area overhead.
Keywords :
floating point arithmetic; integrated circuit reliability; integrated circuit yield; microprocessor chips; multiprocessing systems; cycle-accurate simulator; device reliability problem; dual-core system; faulty core; floating point arithmetic unit; hardware failure; homogeneous chip multiprocessor; manufacturing yield; microarchitectural modification; multicore system; natural redundancy; partial system failure; partial system shutdown; Costs; Frequency; Hardware; Niobium compounds; Performance loss; Redundancy; Stress; Temperature; Titanium compounds; Voltage; micorarchitecture; multiprocessors; reliability; yield;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE '09.
Conference_Location :
Nice
ISSN :
1530-1591
Print_ISBN :
978-1-4244-3781-8
Type :
conf
DOI :
10.1109/DATE.2009.5090714
Filename :
5090714
Link To Document :
بازگشت