Title :
Workflow-Based High Performance Data Transfer and Ingestion to Support Petascale Simulations on TeraGrid
Author :
Zhou, Jun ; Cui, Yifeng ; Davis, Sashka ; Guest, Clark C. ; Maechling, Philip
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of California, La Jolla, CA, USA
Abstract :
In this paper we report on high performance data transfer and ingestion design carried out in a scientific workflow project to support Southern California Earthquake Center (SCEC) petascale simulations on TeraGrid (TG), which is conducive to utilize the grid resource to pipeline data pre- and post-processing in this workflow simulation. We develop an enhanced prototype framework that brings together Globus Toolkit and advanced MPI batch jobs for reliable and efficient data transfer between heterogeneous supercomputer clusters on TG. The framework automates the whole process of data transfer without human intervention and it can recover automatically from any failures during the transfers. We also examine optimization approaches for ingesting simulation data into the iRODS (Integrated Rule-Oriented Data System) digital library. The average transfer rate from TACC Ranger to iRODS achieves 133MB/sec, 5 times faster than conventional methods. Experiments performed on TG clusters demonstrated that these concurrent data transfer and ingestion mechanisms can shorten the processing time of the scientific workflow and significantly reduce the load as well.
Keywords :
application program interfaces; electronic data interchange; grid computing; message passing; pipeline processing; Globus Toolkit; Southern California Earthquake Center petascale simulations; TACC Ranger; Teragrid; advanced MPI; digital library; grid resource; integrated rule oriented data system; pipeline data processing; scientific workflow project; workflow based high performance data transfer; workflow simulation; Computational modeling; Computer simulation; Concurrent computing; Distributed computing; Earthquakes; Grid computing; High performance computing; Performance analysis; Petascale computing; Supercomputers; Data Archival; Data Transfer; Parallelism; iRODS;
Conference_Titel :
Computational Science and Optimization (CSO), 2010 Third International Joint Conference on
Conference_Location :
Huangshan
Print_ISBN :
978-1-4244-6812-6
Electronic_ISBN :
978-1-4244-6813-3
DOI :
10.1109/CSO.2010.235