Query-driven parallel exploration of large datasets

Author

Atanasov, Atanas ; Srinivasan, Madhusudhanan ; Weinzierl, Tobias

Author_Institution

Tech. Univ. Munchen, Munich, Germany

fYear

2012

fDate

14-15 Oct. 2012

Firstpage

23

Lastpage

30

Abstract

Recent advances in supercomputing capabilities pose a multi-faceted data retrieval challenge to the exploration and visualisation of the obtained results: the bandwidth between visualisation devices and the high-performance computing (HPC) clusters neither scales with the simulation data nor with the compute power, the total memory footprint of the data on the supercomputer often exceeds the aggregate memory on the visualisation, and the data has to be distributed among several visualisation nodes working in parallel to render a visual. In the present paper, we introduce an on-demand data exploration paradigm that leverages HPC capabilities and distributed visualisation without requiring a large memory footprint on the visualisation cluster. Regions of interest within the data are specified by the user in the form of queries. These queries, augmented by node identifiers on the visualisation cluster, are automatically distributed among multiple compute nodes of the HPC cluster. The compute nodes work in parallel to assemble and merge data in response to the user query until the data distribution matches the visualisation cluster´s topology. Query results are then simultaneously streamed to the right visualisation nodes. Our approach allows for interactive exploration of data residing on HPC resources, irrespective of memory footprint. The streaming of data to the visualisation nodes scales with the bandwidth of the interconnecting network and the HPC cluster´s domain decomposition, while the latter is hidden from the visualisation and can change dynamically. We demonstrate the capability of our query-driven approach with a turbulent mixing dataset, and show that it supports interactive data exploration on HPC systems.

Keywords

data visualisation; interactive systems; parallel processing; query processing; rendering (computer graphics); HPC clusters; data distribution; data visualisation; distributed visualisation; domain decomposition; high-performance computing; interactive data exploration; interconnecting network bandwidth; large datasets; multifaceted data retrieval; on-demand data exploration paradigm; parallel visualisation nodes; query-driven approach; query-driven parallel exploration; rendering; simulation data; supercomputer; turbulent mixing dataset; user query response; visualisation devices; Computational modeling; Data models; Data visualization; Distributed databases; Load modeling; Supercomputers; Topology; On-demand data exploration; computational steering; distributed visualisation; large-scale data;

fLanguage

English

Publisher

ieee

Conference_Titel

Large Data Analysis and Visualization (LDAV), 2012 IEEE Symposium on

Conference_Location

Seattle, WA

Print_ISBN

978-1-4673-4732-7

Type

conf

DOI

10.1109/LDAV.2012.6378972

Filename

6378972