Title :
Data placement for scientific applications in distributed environments
Author :
Chervenak, Ann ; Deelman, Ewa ; Livny, Miron ; Su, Mei-Hui ; Schuler, Rob ; Bharathi, Shishir ; Mehta, Gaurang ; Vahi, Karan
Author_Institution :
USC Inf. Sci. Inst. Marina Del Rey, Marina
Abstract :
Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis.
Keywords :
distributed processing; natural sciences computing; storage management; workflow management software; data placement; data replication service; distributed environment; scientific application; workflow management system; Application software; Astronomy; Availability; Computer science; Distributed computing; High performance computing; Information analysis; Large-scale systems; Performance analysis; Workflow management software;
Conference_Titel :
Grid Computing, 2007 8th IEEE/ACM International Conference on
Conference_Location :
Austin, Texas
Print_ISBN :
978-1-4244-1560-1
Electronic_ISBN :
978-1-4244-1560-1
DOI :
10.1109/GRID.2007.4354142