Abstract :
TO provide continuity of operations in automated systems, we need to develop techniques that can make them reliable. Many systems such as used in space programs, air traffic control, nuclear plant monitors, ballistic missile defense, etc., demand robust operation. In the past, research efforts have focused on the design and implementation of distributed systems used in such applications. We foresee a need of research effort in the investigation of algorithms and system structures that make error/failure detection, reconfiguration, recovery, and restart of a system feasible with the least amount of interruptions.