Title : 
HEFT: A hybrid system-level framework for enabling energy-efficient fault-tolerance in NoC based MPSoCs
         
        
            Author : 
Yong Zou ; Pasricha, Sudeep
         
        
            Author_Institution : 
Dept. of Electr. & Comput. Eng., Colorado State Univ., Fort Collins, CO, USA
         
        
        
        
        
        
            Abstract : 
In emerging CMOS process technologies, network-on-chip (NoC) fabrics are increasingly becoming susceptible to transient faults. Fault-tolerance mechanisms that are typically employed in NoCs usually entail significant energy overheads that are expected to become prohibitive as fault rates increase in future CMOS technologies. We propose a system-level framework called HEFT to trade-off energy consumption and fault-tolerance in the NoC fabric. Our hybrid framework tackles the challenge of enabling energy-efficient resilience in NoCs in two phases: at design time and at runtime. At design time, we implement an algorithm to guide the robust mapping of cores on to a die while satisfying application bandwidth and latency constraints. At runtime we devise a prediction algorithm to monitor and detect changes in fault susceptibility of NoC components, to intelligently balance energy consumption and reliability. Experimental results show that HEFT improves energy/reliability ratio of synthesized solutions by 8-20%, while meeting application performance goals, when compared to multiple prior works on reliable system-level NoC design.
         
        
            Keywords : 
CMOS integrated circuits; energy conservation; energy consumption; fault tolerance; integrated circuit design; integrated circuit reliability; multiprocessing systems; network-on-chip; transient analysis; CMOS process technology; CMOS technology; HEFT; NoC based MPSoC; NoC component; NoC fabric; application bandwidth; energy consumption; energy-efficient fault-tolerance; energy-efficient resilience; fault rate; fault susceptibility; fault-tolerance mechanism; hybrid system-level framework; latency constraint; network-on-chip fabric; prediction algorithm; reliability; reliable system-level NoC design; robust mapping; transient fault; Bandwidth; Fault tolerance; Fault tolerant systems; Reliability engineering; Runtime; Tunneling magnetoresistance; System-level design; fault-tolerance; networks-on-chip;
         
        
        
        
            Conference_Titel : 
Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2014 International Conference on
         
        
            Conference_Location : 
New Delhi
         
        
        
            DOI : 
10.1145/2656075.2656087