DocumentCode
703946
Title
Improving MPSoC reliability through adapting runtime task schedule based on time-correlated fault behavior
Author
Rozo Duque, Laura A. ; Monsalve Diaz, Jose M. ; Chengmo Yang
Author_Institution
Electr. & Comput. Eng., Univ. of Delaware, Newark, DE, USA
fYear
2015
fDate
9-13 March 2015
Firstpage
818
Lastpage
823
Abstract
The increasing susceptibility of multicore systems to temperature variations, environmental issues and different aging effects has made system reliability a crucial concern. Unpredictability of all these factors makes fault behavior diverse in nature, which should be considered by the runtime task scheduler to improve overall system reliability. To achieve this goal, this paper proposes a fault tolerant approach to model core reliability at runtime and tune resource allocation accordingly. Given variations in fault duration, we propose a reliability model capable of tracking not only faults appeared in each core but also their correlation in time. Taking this model as an input, a runtime scheduling algorithm that allocates critical and vulnerable tasks to reliable cores is also proposed. Experimental results show that the proposed adaptive technique delivers up to 56% improvement in application execution time compared to other techniques.
Keywords
fault tolerance; integrated circuit reliability; multiprocessing systems; resource allocation; scheduling; system-on-chip; MPSoC reliability; aging effects; environmental issues; fault duration; fault tolerant approach; multicore systems; resource allocation; runtime scheduling algorithm; runtime task schedule; temperature variations; time-correlated fault behavior; Adaptation models; Fault tolerance; Fault tolerant systems; Resource management; Runtime; Schedules;
fLanguage
English
Publisher
ieee
Conference_Titel
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
Conference_Location
Grenoble
Print_ISBN
978-3-9815-3704-8
Type
conf
Filename
7092498
Link To Document