DocumentCode :
2518506
Title :
Active Flash: Out-of-core data analytics on flash storage
Author :
Boboila, Simona ; Kim, Youngjae ; Vazhkudai, Sudharshan S. ; Desnoyers, Peter ; Shipman, Galen M.
fYear :
2012
fDate :
16-20 April 2012
Firstpage :
1
Lastpage :
12
Abstract :
Next generation science will increasingly come to rely on the ability to perform efficient, on-the-fly analytics of data generated by high-performance computing (HPC) simulations, modeling complex physical phenomena. Scientific computing workflows are stymied by the traditional chaining of simulation and data analysis, creating multiple rounds of redundant reads and writes to the storage system, which grows in cost with the ever-increasing gap between compute and storage speeds in HPC clusters. Recent HPC acquisitions have introduced compute node-local flash storage as a means to alleviate this I/O bottleneck. We propose a novel approach, Active Flash, to expedite data analysis pipelines by migrating to the location of the data, the flash device itself. We argue that Active Flash has the potential to enable true out-of-core data analytics by freeing up both the compute core and the associated main memory. By performing analysis locally, dependence on limited bandwidth to a central storage system is reduced, while allowing this analysis to proceed in parallel with the main application. In addition, offloading work from the host to the more power-efficient controller reduces peak system power usage, which is already in the megawatt range and poses a major barrier to HPC system scalability. We propose an architecture for Active Flash, explore energy and performance trade-offs in moving computation from host to storage, demonstrate the ability of appropriate embedded controllers to perform data analysis and reduction tasks at speeds sufficient for this application, and present a simulation study of Active Flash scheduling policies. These results show the viability of the Active Flash model, and its capability to potentially have a transformative impact on scientific data analysis.
Keywords :
data analysis; data reduction; flash memories; input-output programs; natural sciences computing; scheduling; storage management; HPC acquisitions; HPC clusters; HPC system scalability; I-O bottleneck; active flash model; active flash scheduling policies; central storage system; compute node-local flash storage; compute speeds; data analysis; data analysis tasks; data location migration; data reduction tasks; high-performance computing simulations; next generation science; offloading work; on-the-fly data analytics; out-of-core data analytics; peak system power usage reduction; power-efficient controller; scientific computing workflows; storage speeds; Analytical models; Ash; Bandwidth; Computational modeling; Computer architecture; Data analysis; Data models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on
Conference_Location :
San Diego, CA
ISSN :
2160-195X
Print_ISBN :
978-1-4673-1745-0
Electronic_ISBN :
2160-195X
Type :
conf
DOI :
10.1109/MSST.2012.6232366
Filename :
6232366
Link To Document :
بازگشت