Title :
A novel approach to improve the performance of Hadoop in handling of small files
Author :
Gohil, Parth ; Panchal, Bakul ; Dhobi, J.S.
Author_Institution :
Dept. of Comput. Sci. & Eng., Inst. of Technol., Varnama, India
Abstract :
Hadoop, an open source java framework deals with big data. It has mainly two core components: HDFS (Hadoop distributed file system) which stores large amount of data in a reliable manner and another is MapReduce which is a programming model which processes the data in parallel and distributed manner. Hadoop does not perform well for small files as a large number of small files pose a heavy burden on the NameNode of HDFS and an increase in execution time for MapReduce is encountered. Hadoop is designed to handle huge size files and hence suffers a performance penalty while dealing with large number of small files. This research work gives an introduction about HDFS, small file problem and existing ways to deal with it these problems along with proposed approach to handle small files. In proposed approach, merging of small file is done using MapReduce programming model on Hadoop. This approach improves the performance of Hadoop in handling of small files by ignoring the files whose size is larger than the block size of Hadoop and also reduces the memory required by NameNode to store them.
Keywords :
data handling; distributed databases; parallel processing; HDFS; Hadoop distributed file system; MapReduce programming; NameNode; programming model; small files handling; Blogs; File systems; Memory management; Tutorials; Amazon EC2; HDFS; Hadoop; MapReduce; Small Files;
Conference_Titel :
Electrical, Computer and Communication Technologies (ICECCT), 2015 IEEE International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4799-6084-2
DOI :
10.1109/ICECCT.2015.7226044