DocumentCode :
3647446
Title :
Multicluster Hadoop Distributed File System
Author :
I. Tomašić;J. Ugovšek;A. Rashkovska;R. Trobec
Author_Institution :
Jož
fYear :
2012
fDate :
5/1/2012 12:00:00 AM
Firstpage :
301
Lastpage :
305
Abstract :
The Hadoop Distributed File System (HDFS) is one of the important subprojects of the Apache Hadoop project that allows the distributed processing and fast access to large data sets on distributed storage platforms. The HDFS is normally installed on a cluster of computers. When the cluster becomes undersized, one commonly used possibility is to scale the cluster by adding new computers and storage devices. Another possibility, not exploited so far, is to resort for resources on another computer cluster. In this paper we present a multicluster HDFS installation extended across two clusters, with different operating systems, connected over the Internet. The specific networking parameters and HDFS configuration parameters, needed for a multicluster installation, are presented. We have benchmarked a single and dual cluster installation with the same networking and configuration parameters. The benchmark results indicate that multicluster HDFS provide increased storage area, however, the data manipulation speed is limited by the bandwidth of communication channel that connects both clusters.
Keywords :
"Bandwidth","File systems","Benchmark testing","Computers","Cloud computing","Operating systems"
Publisher :
ieee
Conference_Titel :
MIPRO, 2012 Proceedings of the 35th International Convention
Print_ISBN :
978-1-4673-2577-6
Type :
conf
Filename :
6240660
Link To Document :
بازگشت