DocumentCode :
3053361
Title :
Algorithm-based fault tolerance for many-core architectures
Author :
Braun, Claus ; Wunderlich, Hans-Joachim
Author_Institution :
Inst. of Comput. Archit. & Comput. Eng., Univ. of Stuttgart, Stuttgart, Germany
fYear :
2010
fDate :
24-28 May 2010
Firstpage :
253
Lastpage :
253
Abstract :
Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based hardware fault tolerance. Here, the software is able to detect and correct errors introduced by the hardware. In this work, we propose a software-based approach to improve the reliability of matrix operations on many-core processors. These operations are key components in many scientific applications.
Keywords :
multiprocessing systems; parallel architectures; software fault tolerance; high performance scientific computing; high performance scientific simulation; many-core architectures; many-core processors; matrix operation reliability; nanoscaled semiconductor devices; software based hardware fault tolerance; Computational modeling; Computer architecture; Encoding; Error correction; Fault tolerance; Fault tolerant systems; Hardware; Reliability engineering; Semiconductor devices; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Test Symposium (ETS), 2010 15th IEEE European
Conference_Location :
Praha
ISSN :
1530-1877
Print_ISBN :
978-1-4244-5834-9
Electronic_ISBN :
1530-1877
Type :
conf
DOI :
10.1109/ETSYM.2010.5512738
Filename :
5512738
Link To Document :
بازگشت