Title :
Using Lightweight Transactions and Snapshots for Fault-Tolerant Services Based on Shared Storage Bricks
Author :
Flourish, M.D. ; Lachaize, Renaud ; Bilas, Angelos
Author_Institution :
ICS-FORTH, Heraklion
Abstract :
To satisfy current and future application needs in a cost effective manner, storage systems are evolving from monolithic disk arrays to networked storage architectures based on commodity components. So far, this architectural transition has mostly been envisioned as a way to scale capacity and performance. In this work we examine how the block-level interface exported by such networked storage systems can be extended to deal with reliability. Our goals are: (a) At the design level, to examine how strong reliability semantics can be offered at the block level; (b) At the implementation level, to examine the mechanisms required and how they may be provided in a modular and configurable manner. We first discuss how transactional-type semantics may be offered at the block level. We present a system design that uses the concept of atomic update intervals combined with existing, block-level locking and snapshot mechanisms, in contrast to the more common journaling techniques. We discuss in detail the design of the associated mechanisms and the trade-offs and challenges when dividing the required functionality between the file-system and the block-level storage. Our approach is based on a unified and thus, non-redundant set of mechanisms for providing reliability both at the block and file level. Our design and implementation effectively provide a tunable, lightweight transactions mechanism to higher system and application layers. Finally, we describe how the associated protocols can be implemented in a modular way in a prototype storage system we are currently building. As our system is currently being implemented, we do not present performance results
Keywords :
fault tolerant computing; storage management; atomic update interval; block-level interface; block-level locking; block-level snapshot mechanism; fault-tolerant services; lightweight transactions; networked storage systems; reliability semantics; shared storage bricks; Assembly systems; Availability; Computer science; Costs; Databases; Fault tolerance; Network servers; Scalability; Storage area networks; Workstations;
Conference_Titel :
Cluster Computing, 2006 IEEE International Conference on
Conference_Location :
Barcelona
Print_ISBN :
1-4244-0327-8
Electronic_ISBN :
1552-5244
DOI :
10.1109/CLUSTR.2006.311896