Title :
A Strategy to Deal with Mass Small Files in HDFS
Author :
Shuo Zhang ; Li Miao ; Dafang Zhang ; Yuli Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Hunan Univ., Changsha, China
Abstract :
HDFS performs badly in storing and managing a great number of small files as a result of the great memory occupation of the single Namenode and massive seeks and hopping from datanode to datanode. Traditional solutions are only efficient for specific file size or file format. In this paper, we evaluate the performance of some different solutions such as Hbase and Avro. Then in order to compensate for the lack of their inefficiency for middle size small file, we implement a merging and prefetching mechanism. Finally for the purpose of reducing the influence of different file size distributions, we present a strategy of using different schemes for small files of different sizes. Through the experiments of performance comparison, it can be demonstrated that the strategy can improve the original HDFS´s writing and reading performance by about 70%.
Keywords :
storage management; HDFS; datanode; file format; file size distributions; mass small files; memory occupation; prefetching mechanism; single Namenode; File systems; Indexes; Information services; Memory management; Merging; Prefetching; Writing; Avro; HBASE; HDFS; small files; strategy;
Conference_Titel :
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2014 Sixth International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4799-4956-4
DOI :
10.1109/IHMSC.2014.87