DocumentCode :
1920880
Title :
CHAIO: Enabling HPC Applications on Data-Intensive File Systems
Author :
Jin, Hui ; Ji, Jiayu ; Sun, Xian-He ; Chen, Yong ; Thakur, Rajeev
Author_Institution :
Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
fYear :
2012
fDate :
10-13 Sept. 2012
Firstpage :
369
Lastpage :
378
Abstract :
The computing paradigm of "HPC in the Cloud" has gained a surging interest in recent years, due to its merits of cost-efficiency, flexibility, and scalability. Cloud is designed on top of distributed file systems such as Google file system (GFS). The capability of running HPC applications on top of data-intensive file systems is a critical catalyst in promoting Clouds for HPC. However, the semantic gap between data-intensive file systems and HPC imposes numerous challenges. For example, N-1 (N to 1) is a widely used data access pattern for HPC applications such as check pointing, but cannot perform well on data-intensive file systems. In this study, we propose the CHunk-Aware I/O (CHAIO) strategy to enable efficient N-1 data access on data-intensive distributed file systems. CHAIO reorganizes I/O requests to favor data-intensive file systems and avoid possible access contention. It balances the workload distribution and promotes data locality. We have tested the CHAIO design over the Kosmos file system (KFS). Experimental results show that CHAIO achieves a more than two-fold improvement in I/O bandwidth for both write and read operations. Experiments in large-scale environment confirm the potential of CHAIO for small and irregular requests. The aggregator selection algorithm works well to balance the workload distribution. CHAIO is a critical and necessary step to enable HPC in the Cloud.
Keywords :
cloud computing; file organisation; CHAIO; GFS; Google file system; HPC application; KFS; Kosmos file system; aggregator selection algorithm; chunk-aware I/O strategy; cloud computing; data access pattern; data locality; data-intensive file system; distributed file system; semantic gap; workload distribution; Checkpointing; Concurrent computing; Distributed databases; File systems; Hardware; Semantics; Servers; MapReduce; data-intensive; distributed file system; high-perfomrance computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2012 41st International Conference on
Conference_Location :
Pittsburgh, PA
ISSN :
0190-3918
Print_ISBN :
978-1-4673-2508-0
Type :
conf
DOI :
10.1109/ICPP.2012.1
Filename :
6337598
Link To Document :
بازگشت