Title :
Collective Computing for Scientific Big Data Analysis
Author :
Jialin Liu;Yong Chen;Surendra Byna
Author_Institution :
Dept. of Comput. Sci., Texas Tech Univ., Lubbock, TX, USA
Abstract :
Big science discovery requires an efficient computing framework in the high performance computing architecture. Traditional scientific data analysis relies on Message Passing Interface (MPI) and MPI-IO to achieve fast computing and low I/O bottleneck. Among them, two-phase collective I/O is commonly used to reduce data movement by optimizing the non-contiguous I/O pattern. However, the inherent constraint of collective I/O prevents it from having a flexible combination with computing and lacks an efficient non-blocking I/O-Computing framework in current HPC. In this work, we propose Collective Computing, a framework that breaks the constraint of the two-phase collective I/O and provides an efficient non-blocking computing paradigm with runtime support. The fundamental idea is to move the analysis stage in advance and insert the computation into the two-phase I/O, such that the data in the first I/O phase can be computed in place and the second shuffle phase is minimized with a reduce operation. We motivate this idea by profiling the I/O and CPU usage. With both theoretical analysis and evaluation on real application and benchmarks, we show that the collective computing can achieve 2.5X speedup and is promising in big scientific data analysis.
Keywords :
"Big data","Data analysis","Electronic mail","Computer science","Benchmark testing","Inductors","Meteorology"
Conference_Titel :
Parallel Processing Workshops (ICPPW), 2015 44th International Conference on
DOI :
10.1109/ICPPW.2015.22