Title :
TPT-RAID: a High Performance Box-Fault Tolerant Storage System
Author :
Birk, Yitzhak ; Zilber, Erez
Author_Institution :
Technion - Israel Inst. of Technol., Haifa
Abstract :
TPT-RAID is a multi-box RAID wherein each ECC group comprises at most one block from any given storage box, and can thus tolerate a box failure. It extends the idea of an out-of band SAN controller into the RAID: data is sent directly between hosts and targets and among targets, and the RAID controller supervises ECC calculation by the targets. By preventing a communication bottleneck in the controller, excellent scalability is achieved while retaining the simplicity of centralized control. TPT-RAID, whose controller can be a software module within an out-of-band SAN controller, moreover conforms to a conventional switched network architecture, whereas an in-band RAID controller would either constitute a communication bottleneck or would have to also be a full-fledged router. The design is validated in an InfiniBand-based prototype using /SCSI and /SER, and required changes to relevant protocols are introduced.
Keywords :
RAID; centralised control; control engineering computing; error correction; fault tolerance; SAN controller; TPT-RAID; centralized control; error correcting group; full-fledged router; high performance box-fault tolerant storage system; switched network architecture; Centralized control; Communication switching; Communication system control; Computer architecture; Error correction codes; Fault tolerance; Joining processes; Scalability; Storage area networks; Switches;
Conference_Titel :
Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-0-7695-3025-3
DOI :
10.1109/MSST.2007.4367975