• DocumentCode
    2518506
  • Title

    Active Flash: Out-of-core data analytics on flash storage

  • Author

    Boboila, Simona ; Kim, Youngjae ; Vazhkudai, Sudharshan S. ; Desnoyers, Peter ; Shipman, Galen M.

  • fYear
    2012
  • fDate
    16-20 April 2012
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Next generation science will increasingly come to rely on the ability to perform efficient, on-the-fly analytics of data generated by high-performance computing (HPC) simulations, modeling complex physical phenomena. Scientific computing workflows are stymied by the traditional chaining of simulation and data analysis, creating multiple rounds of redundant reads and writes to the storage system, which grows in cost with the ever-increasing gap between compute and storage speeds in HPC clusters. Recent HPC acquisitions have introduced compute node-local flash storage as a means to alleviate this I/O bottleneck. We propose a novel approach, Active Flash, to expedite data analysis pipelines by migrating to the location of the data, the flash device itself. We argue that Active Flash has the potential to enable true out-of-core data analytics by freeing up both the compute core and the associated main memory. By performing analysis locally, dependence on limited bandwidth to a central storage system is reduced, while allowing this analysis to proceed in parallel with the main application. In addition, offloading work from the host to the more power-efficient controller reduces peak system power usage, which is already in the megawatt range and poses a major barrier to HPC system scalability. We propose an architecture for Active Flash, explore energy and performance trade-offs in moving computation from host to storage, demonstrate the ability of appropriate embedded controllers to perform data analysis and reduction tasks at speeds sufficient for this application, and present a simulation study of Active Flash scheduling policies. These results show the viability of the Active Flash model, and its capability to potentially have a transformative impact on scientific data analysis.
  • Keywords
    data analysis; data reduction; flash memories; input-output programs; natural sciences computing; scheduling; storage management; HPC acquisitions; HPC clusters; HPC system scalability; I-O bottleneck; active flash model; active flash scheduling policies; central storage system; compute node-local flash storage; compute speeds; data analysis; data analysis tasks; data location migration; data reduction tasks; high-performance computing simulations; next generation science; offloading work; on-the-fly data analytics; out-of-core data analytics; peak system power usage reduction; power-efficient controller; scientific computing workflows; storage speeds; Analytical models; Ash; Bandwidth; Computational modeling; Computer architecture; Data analysis; Data models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on
  • Conference_Location
    San Diego, CA
  • ISSN
    2160-195X
  • Print_ISBN
    978-1-4673-1745-0
  • Electronic_ISBN
    2160-195X
  • Type

    conf

  • DOI
    10.1109/MSST.2012.6232366
  • Filename
    6232366