Title :
THE optimization of HDFS based on small files
Author :
Jiang, Liu ; Li, Bing ; Song, Meina
Author_Institution :
Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
HDFS is a distributed file system which can process large amounts of data effectively through large clusters, the HADOOP framework which is based on it has been widely used in various clusters to build large scale, high performance systems. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. There are many companies focus on cloud storage areas today, such as Amazon´s s3 which provide data hosting. With the rapid development of Internet, users may be more tend to store their data and programs in the cloud computing platform in the future, the personal data has an obvious feature-most of them is small files, so HDFS can not meet this demand. In this article, we optimize the HDFS I/O feature based on small files, the basic idea is let one block save many small files and let the datanode save some meta-data of small files in it´s memory. The experiment shows that our design can provide a better performance.
Keywords :
cloud computing; distributed processing; optimisation; storage management; Amazon s3; HDFS I/O feature; HDFS optimization; Internet; cloud storage areas; data hosting; distributed file system; small files; HADOOP; HDFS; small files I/O;
Conference_Titel :
Broadband Network and Multimedia Technology (IC-BNMT), 2010 3rd IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6769-3
DOI :
10.1109/ICBNMT.2010.5705223