DocumentCode
709260
Title
C´Mon: a predictable monitoring infrastructure for system-level latent fault detection and recovery
Author
Jiguo Song ; Parmer, Gabriel
Author_Institution
George Washington Univ., Washington, DC, USA
fYear
2015
fDate
13-16 April 2015
Firstpage
247
Lastpage
258
Abstract
Embedded and real-time systems must balance between many often conflicting goals including predictability, high utilization, efficiency, reliability, and SWaP (size, weight, and power). Reliability is particularly difficult to achieve without significantly impacting the other factors. Though reliability solutions exist for application-level, they are invalidated by system-level faults that are particularly difficult to detect and recover from. This paper presents the C´Mon system for predictably and efficiently monitoring system-level execution, and validating that it conforms with the high-level analytical models that underlie the timing guarantees of the system. Latent faults such as timing errors, incorrect scheduler decisions, unbounded priority inversions, or deadlocks are detected, the faulty component is identified, and using previous work in system recovery, the system is brought back to a stable state - all without missing deadlines.
Keywords
embedded systems; fault diagnosis; program diagnostics; system recovery; C´Mon system; deadlocks; embedded system; incorrect scheduler decisions; predictable monitoring infrastructure; real-time systems; system timing guarantees; system-level execution monitoring; system-level latent fault detection; system-level latent fault recovery; timing errors; unbounded priority inversions; Computational modeling; Fault tolerant systems; Instruction sets; Monitoring; Real-time systems; Synchronization;
fLanguage
English
Publisher
ieee
Conference_Titel
Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015 IEEE
Conference_Location
Seattle, WA
Type
conf
DOI
10.1109/RTAS.2015.7108448
Filename
7108448
Link To Document