Title : 
SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory
         
        
            Author : 
Yalcin, Gulay ; Unsal, Osman S. ; Cristal, Adrian ; Hur, Ibrahim ; Valero, Mateo
         
        
            Author_Institution : 
Artificial Intell. Res. Inst., Spanish Nat. Res. Council, Spain
         
        
        
        
        
        
            Abstract : 
Fault-tolerance has become an essential concern for processor designers due to increasing transient and permanent fault rates. In this study we propose Symptom TM, a symptom-based error detection technique that recovers from errors by leveraging the abort mechanism of Transactional Memory (TM). To the best of our knowledge, this is the first architectural fault-tolerance proposal using Hardware Transactional Memory (HTM). Symptom TM can recover from 86% and 65% of catastrophic failures caused by transient and permanent errors respectively with no performance overhead in error-free executions.
         
        
            Keywords : 
fault tolerant computing; system recovery; transaction processing; SymptomTM; architectural fault tolerance; hardware transactional memory; permanent fault rate; processor designer; symptom-based error detection; symptom-based error recovery; transient fault rate; Fault tolerance; Fault tolerant systems; Hardware; Monitoring; Proposals; Transient analysis; Fault Tolerance; Hardware Transactional Memory;
         
        
        
        
            Conference_Titel : 
Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on
         
        
            Conference_Location : 
Galveston, TX
         
        
        
            Print_ISBN : 
978-1-4577-1794-9
         
        
        
            DOI : 
10.1109/PACT.2011.39