Title :
File-Access Characteristics of Data-Intensive Workflow Applications
Author :
Shibata, Takeshi ; Choi, SungJun ; Taura, Kenjiro
Abstract :
This paper studies five real-world data intensive workflow applications in the fields of natural language processing, astronomy image analysis, and web data analysis. Data intensive workflows are increasingly becoming important applications for cluster and Grid environments. They open new challenges to various components of workflow execution environments including job dispatchers, schedulers, file systems, and file staging tools. Their impacts on real workloads are largely unknown. Under- standing characteristics of real-world workflow applications is a required step to promote research in this area. To this end, we analyse real-world workflow applications focusing on their file access patterns and summarize their implications to schedulers and file system/staging designs.
Keywords :
Astronomy; Clouds; Data analysis; File systems; Image analysis; Natural language processing; Parallel processing; Parallel programming; Pattern analysis; Scheduling algorithm; Workflow Applications;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on
Conference_Location :
Melbourne, Australia
Print_ISBN :
978-1-4244-6987-1
DOI :
10.1109/CCGRID.2010.77