Title :
Mutable Protection Domains: Adapting System Fault Isolation for Reliability and Efficiency
Author :
Parmer, Gabriel ; West, Richard
Author_Institution :
Dept. of Comput. Sci., George Washington Univ., Washington, DC, USA
Abstract :
As software systems are becoming increasingly complex, the likelihood of faults and unexpected behaviors will naturally increase. Today, mobile devices to large-scale servers feature many millions of lines of code. Compile-time checks and offline verification methods are unlikely to capture all system states and control flow interactions of a running system. For this reason, many researchers have developed methods to contain faults at runtime by using software and hardware-based techniques to define protection domains. However, these approaches tend to impose isolation boundaries on software components that are static, and thus remain intact while the system is running. An unfortunate consequence of statically structured protection domains is that they may impose undue overhead on the communication between separate components. This paper proposes a new runtime technique that trades communication cost for fault isolation. We describe Mutable Protection Domains (MPDs) in the context of our Composite operating system. MPD dynamically adapts hardware isolation between interacting software components, depending on observed communication “hot-paths,” with the purpose of maximizing fault isolation where possible. In this sense, MPD naturally tends toward a system of maximal component isolation, while collapsing protection domains where costs are prohibitive. By increasing isolation for low-cost interacting components, MPD limits the scope of impact of future unexpected faults. We demonstrate the utility of MPD using a webserver, and identify different hot-paths for different workloads that dictate adaptations to system structure. Experiments show up to 40 percent improvement in throughput compared to a statically organized system, while maintaining high-fault isolation.
Keywords :
fault tolerant computing; mobile computing; object-oriented programming; operating systems (computers); MPD; compile-time checks; composite operating system; control flow interactions; fault isolation; hardware-based techniques; large-scale servers; maximal component isolation; mobile computing; mobile devices; mutable protection domains; offline verification methods; software -based techniques; software components; software systems; system fault isolation; Hardware; Kernel; Reliability; Servers; Switches; Component-based; fault isolation; operating systems; performance; reliability;
Journal_Title :
Software Engineering, IEEE Transactions on
DOI :
10.1109/TSE.2011.61