Title :
Hadoop distributed file system for the Grid
Author :
Attebury, Garhan ; Baranovski, Andrew ; Bloom, Ken ; Bockelman, Brian ; Kcira, Dorian ; Letts, James ; Levshina, Tanya ; Lundestedt, Carl ; Martin, Terrence ; Maier, Will ; Pi, Haifeng ; Rana, Abhishek ; Sfiligoi, Igor ; Sim, Alexander ; Thomas, Michael ;
Author_Institution :
Univ. of Nebraska Lincoln, Lincoln, NE, USA
fDate :
Oct. 24 2009-Nov. 1 2009
Abstract :
Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service framework for the Storage Element (SE). Scalability tests show that sustained high inter-DataNode data transfer can be achieved for the cluster fully loaded with data-processing jobs. The WAN transfer to HDFS supported by BeStMan and tuned GridFTP servers shows large scalability and robustness of the system. The hadoop client can be deployed at interactive machines to support remote data access. The ability to automatically replicate precious data is especially important for computing sites, which is demonstrated at the Large Hadron Collider (LHC) computing centers. The simplicity of operations of HDFS-based SE significantly reduces the cost of ownership of Petabyte scale data storage over alternative solutions.
Keywords :
electronic data interchange; file servers; grid computing; high energy physics instrumentation computing; information retrieval; information storage; wide area networks; BeStMan server; CPU intensive computing; HDFS; Hadoop distributed file system; LHC computing centers; Large Hadron Collider; OSG middleware stack; Open Science Grid; Petabyte scale data storage; WAN transfer; data access; data distribution; data intensive computing; high performance Grid computing; inter-datanode data transfer; scalability tests; storage element; tuned GridFTP server; Costs; File systems; Grid computing; Large Hadron Collider; Memory; Middleware; Robustness; Scalability; System testing; Wide area networks;
Conference_Titel :
Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-4244-3961-4
Electronic_ISBN :
1095-7863
DOI :
10.1109/NSSMIC.2009.5402426