Title :
Grid Datafarm Architecture for Petascale Data Intensive Computing
Author :
Tatebe, Osamu ; Morita, Youhei ; Matsuoka, Satoshi ; Soda, Noriyuki ; Sekiguchi, Satoshi
Abstract :
The Grid Datafarm (Gfarm) architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage, scalable I/O bandwidth, and scalable parallel processing, and it can exploit local I/O in a grid of clusters with tens of thousands of nodes. Gfarm parallel I/O APIs and commands provide a single filesystem image and manipulate filesystem metadata consistently. Fault tolerance and load balancing are automatically managed by file duplication or recomputation using a command history log. Preliminary performance evaluation has shown scalable disk I/O and network bandwidth on 64 nodes of the Presto III Athlon cluster. The Gfarm parallel I/O write and read operations has achieved data transfer rates of 1.74 GB/s and 1.97 GB/s, respectively, using 64 cluster nodes. The Gfarm parallel file copy reached 443 MB/s with 23 parallel streams on the Myrinet 2000. The Gfarm architecture is expected to enable petascale data-intensive Grid computing with an I/O bandwidth scales to the TB/s range and scalable computational power.
Keywords :
Bandwidth; Computer architecture; Computer industry; Concurrent computing; Fault tolerance; Grid computing; Large Hadron Collider; Large-scale systems; Petascale computing; Processor scheduling;
Conference_Titel :
Cluster Computing and the Grid, 2002. 2nd IEEE/ACM International Symposium on
Print_ISBN :
0-7695-1582-7
DOI :
10.1109/CCGRID.2002.1017117