Title :
Improving performance of small-file accessing in Hadoop
Author :
Vorapongkitipun, Chatuporn ; Nupairoj, Natawut
Author_Institution :
Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
Abstract :
The Hadoop Distributed File System (HDFS) is an open source system which is designed to run on commodity hardware and is suitable for applications that have large data sets (terabytes). As HDFS architecture bases on single master (NameNode) to handle metadata management for multiple slaves (Datanode), NameNode often becomes bottleneck, especially when handling large number of small files. To maximize efficiency, NameNode stores the entire metadata of HDFS in its main memory. With too many small files, NameNode can be running out of memory. In this paper, we propose a mechanism based on Hadoop Archive (HAR), called New Hadoop Archive (NHAR), to improve the memory utilization for metadata and enhance the efficiency of accessing small files in HDFS. In addition, we also extend HAR capabilities to allow additional files to be inserted into the existing archive files. Our experiment results show that our approach can to improve the access efficiencies of small files drastically as it outperforms HAR up to 85.47%.
Keywords :
distributed processing; file organisation; meta data; public domain software; Datanode; HAR capabilities; HDFS; HDFS architecture; Hadoop distributed file system; NHAR; NameNode; commodity hardware; large data sets; memory utilization; metadata management formultiple slaves; new Hadoop archive; open source system; small file access efficiency; small-file accessing performance improvement; HAR; HDFS; Hadoop; Hadoop Archive; Improve performance; Small files in Hadoop;
Conference_Titel :
Computer Science and Software Engineering (JCSSE), 2014 11th International Joint Conference on
Conference_Location :
Chon Buri
Print_ISBN :
978-1-4799-5821-4
DOI :
10.1109/JCSSE.2014.6841867