DocumentCode
264982
Title
A Strategy to Deal with Mass Small Files in HDFS
Author
Shuo Zhang ; Li Miao ; Dafang Zhang ; Yuli Wang
Author_Institution
Dept. of Comput. Sci. & Eng., Hunan Univ., Changsha, China
Volume
1
fYear
2014
fDate
26-27 Aug. 2014
Firstpage
331
Lastpage
334
Abstract
HDFS performs badly in storing and managing a great number of small files as a result of the great memory occupation of the single Namenode and massive seeks and hopping from datanode to datanode. Traditional solutions are only efficient for specific file size or file format. In this paper, we evaluate the performance of some different solutions such as Hbase and Avro. Then in order to compensate for the lack of their inefficiency for middle size small file, we implement a merging and prefetching mechanism. Finally for the purpose of reducing the influence of different file size distributions, we present a strategy of using different schemes for small files of different sizes. Through the experiments of performance comparison, it can be demonstrated that the strategy can improve the original HDFS´s writing and reading performance by about 70%.
Keywords
storage management; HDFS; datanode; file format; file size distributions; mass small files; memory occupation; prefetching mechanism; single Namenode; File systems; Indexes; Information services; Memory management; Merging; Prefetching; Writing; Avro; HBASE; HDFS; small files; strategy;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2014 Sixth International Conference on
Conference_Location
Hangzhou
Print_ISBN
978-1-4799-4956-4
Type
conf
DOI
10.1109/IHMSC.2014.87
Filename
6917370
Link To Document