Title :
On-the-fly healing of race conditions in ARINC-653 flight software
Author :
Ha, Ok-Kyoon ; Tchamgoue, Guy Martin ; Suh, Jeong-Bae ; Jun, Yong-Kee
Author_Institution :
Dept. of Inf., Gyeongsang Nat. Univ., Jinju, South Korea
Abstract :
The ARINC-653 standard architecture for flight software specifies an application executive (APEX) which provides an application programming interface and defines a hierarchical framework which provides health management for error detection and recovery. In every partition of the architecture, however, asynchronously concurrent processes or threads may include concurrency bugs such as unintended race conditions which are common and difficult to remove by testing. A race condition toward a shared data, or data race, is a pair of unsynchronized instructions that access a shared variable with at least one write access. Data races threaten the reliability of shared-memory programs seriously and latently, because they result in unintended nondeterministic executions of the programs. To heal data race during executions of ARINC-653 flight software, this paper instruments on-the-fly race detection into the target program and incorporates on-the-fly race healing into the health management of the ARINC-653 architecture. The race detection signals to the health monitor using the corresponding APEX call, if a data race is detected. The health monitor then responds by invoking an aperiodic, user-defined, error handling process that is assigned the highest possible priority. This special process uses an APEX call to identify and then heals the occurrence of race condition as an application error, one of seven error types defined by ARINC-653. This race-healing process allows the target programs to be assured at run-time that the execution result of the healed program could have been in the original program and therefore no new functional bug has been introduced. This paper evaluates efficiencies of the on-the-fly mechanisms to argue that they are practical to be configured under the ARINC-653 partitions.
Keywords :
aerospace computing; application program interfaces; avionics; concurrency control; error detection; program debugging; shared memory systems; software reliability; system recovery; APEX; ARΓNC-653 standard architecture; ARINC-653 flight software; application executive; application programming interface; asynchronously concurrent process; concurrency bug; data race; error detection; error recovery; health management; nondeterministic execution; shared-memory program reliability; unsynchronized instruction; write access; Computer architecture; Concurrent computing; History; Instruction sets; Monitoring; Protocols;
Conference_Titel :
Digital Avionics Systems Conference (DASC), 2010 IEEE/AIAA 29th
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4244-6616-0
DOI :
10.1109/DASC.2010.5655315