Title :
Robust graph traversal: Resiliency techniques for data intensive supercomputing
Author :
Hukerikar, Saurabh ; Diniz, Pedro C. ; Lucas, Robert F.
Author_Institution :
Inf. Sci. Inst., Univ. of Southern California, Marina del Rey, CA, USA
Abstract :
Emerging large-scale, data intensive applications that use the graph abstraction to represent problems in a broad spectrum of scientific and analytics applications have radically different features from floating point intensive scientific applications. These complex graph applications, besides having large working datasets, exhibit very low spatial and temporal locality which makes designing algorithmic fault tolerance for these quite challenging. They will run on future exascale-class High Performance Computing (HPC) systems that will contain massive number of components, and will be built from devices far less reliable than those used today. In this paper we propose software based approaches that increase robustness for these data intensive, graph-based applications by managing the resiliency in terms of the data flow progress and validation of pointer computations. Our experimental results show that such a simple approach incurs fairly low execution time overheads while allowing these computations to survive a large number of faults that would otherwise always result in the termination of the computation.
Keywords :
data flow computing; fault tolerant computing; floating point arithmetic; graph theory; parallel machines; HPC; algorithmic fault tolerance; complex graph applications; data flow progress; data intensive supercomputing; floating point intensive scientific applications; graph abstraction; graph-based applications; high performance computing systems; pointer computations; resiliency techniques; robust graph traversal; software based approaches; Computer crashes; Data structures; Error correction codes; Fault tolerance; Fault tolerant systems; Robustness; Runtime;
Conference_Titel :
High Performance Extreme Computing Conference (HPEC), 2013 IEEE
Conference_Location :
Waltham, MA
Print_ISBN :
978-1-4799-1364-0
DOI :
10.1109/HPEC.2013.6670340