DocumentCode :
3588916
Title :
Consistency and Fault Tolerance Considerations for the Next Iteration of the DOE Fast Forward Storage and IO Project
Author :
Lofstead, Jay ; Jimenez, Ivo ; Maltzahn, Carlos
fYear :
2014
Firstpage :
61
Lastpage :
69
Abstract :
The DOE Extreme-Scale Technology Acceleration Fast Forward Storage and IO Stack project is going to have significant impact on storage systems design within and beyond the HPC community. With phase 1 of the project complete, it is an excellent opportunity to evaluate many of the decisions made to feed into the phase 2 effort. With this paper we not only provide a timely summary of important aspects of the design specifications but also capture the underlying reasoning that is not available elsewhere.The initial effort to define a next generation storage system has made admirable contributions in architecture and design. Formalizing the general idea of data staging into burst buffers for the storage system will help manage the performance variability and offer additional data processing opportunities outside the main compute and storage system. Adding a transactional mechanism to manage faults and data visibility helps enable effective analytics without having to work around the IO stack semantics. While these and other contributions are valuable, similar efforts made elsewhere may offer attractive alternatives or differing semantics that could yield a more feature rich environment with little to no additional overhead. For example, the Doubly Distributed Transactions (D2T) protocol offers an alternative approach for incorporating transactional semantics into the data path. Another project, PreDatA, examined how to get the best throughput for data operators and may offer additional insights into further refinements of the Burst Buffer concept. This paper examines some of the choices made by the Fast Forward team and compares them with other options and offers observations and suggestions based on these other efforts. This will include some non-core contributions of other projects, such as some of the demonstration metadata and data storage components generated while implementing D2T, to make suggestions that may help the next generation design for how the IO stack w- rks as a whole.
Keywords :
fault tolerance; meta data; parallel processing; storage management; D2T protocol; DOE extreme-scale technology acceleration fast forward storage; DOE fast forward storage; HPC community; IO project; IO stack project; IO stack semantics; PreDatA; burst buffer concept; data operator; data processing opportunity; data storage component; demonstration metadata; doubly distributed transactions protocol; fault tolerance consideration; next generation storage system; next iteration; performance variability; storage systems design; transactional semantics; Arrays; Bandwidth; Buffer storage; Containers; Random access memory; Writing; Fast Forward Storage and I/O; HDF5; file system; parallel file system; transactions;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on
ISSN :
1530-2016
Type :
conf
DOI :
10.1109/ICPPW.2014.21
Filename :
7103439
Link To Document :
بازگشت