Title :
Designing for Recovery: New Challenges for Large-Scale Complex IT Systems
Author :
Sommerville, Ian
Author_Institution :
St. Andrews Univ., St. Andrews
Abstract :
Summary form only given. Since the 1980s, the object of design for dependability has been to avoid, detect or tolerate system faults so that these do not result in failures that are detectable outside the system. Whilst this is potentially achievable in medium size systems that are controlled by a single organisations, it is now practically impossible to achieve in large-scale systems of systems where different parts of the system are owned and controlled by different organisations. Therefore, we must accept the inevitability of failure and re-orient our system design strategies to recover from those failures at minimal cost and as quickly as possible. This talk will discuss why such recovery strategies cannot be purely technical but must be socio-technical in nature and argue that design for recovery will require a better understanding of how people recover from failure and the information they need during that recovery process. I will argue that supporting recovery should be a fundamental design objective of systems and explore what this means for current approaches to large-scale systems design.
Keywords :
fault tolerant computing; system recovery; design for dependability; large-scale complex IT systems; large-scale systems of systems; system fault detection; system fault tolerance; Control systems; Costs; Fault detection; Large-scale systems; Object detection; Size control; Software design; Software systems;
Conference_Titel :
Composition-Based Software Systems, 2008. ICCBSS 2008. Seventh International Conference on
Conference_Location :
Madrid
Print_ISBN :
978-0-7695-3091-8
DOI :
10.1109/ICCBSS.2008.42