• DocumentCode
    163259
  • Title

    Improving performance of small-file accessing in Hadoop

  • Author

    Vorapongkitipun, Chatuporn ; Nupairoj, Natawut

  • Author_Institution
    Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
  • fYear
    2014
  • fDate
    14-16 May 2014
  • Firstpage
    200
  • Lastpage
    205
  • Abstract
    The Hadoop Distributed File System (HDFS) is an open source system which is designed to run on commodity hardware and is suitable for applications that have large data sets (terabytes). As HDFS architecture bases on single master (NameNode) to handle metadata management for multiple slaves (Datanode), NameNode often becomes bottleneck, especially when handling large number of small files. To maximize efficiency, NameNode stores the entire metadata of HDFS in its main memory. With too many small files, NameNode can be running out of memory. In this paper, we propose a mechanism based on Hadoop Archive (HAR), called New Hadoop Archive (NHAR), to improve the memory utilization for metadata and enhance the efficiency of accessing small files in HDFS. In addition, we also extend HAR capabilities to allow additional files to be inserted into the existing archive files. Our experiment results show that our approach can to improve the access efficiencies of small files drastically as it outperforms HAR up to 85.47%.
  • Keywords
    distributed processing; file organisation; meta data; public domain software; Datanode; HAR capabilities; HDFS; HDFS architecture; Hadoop distributed file system; NHAR; NameNode; commodity hardware; large data sets; memory utilization; metadata management formultiple slaves; new Hadoop archive; open source system; small file access efficiency; small-file accessing performance improvement; HAR; HDFS; Hadoop; Hadoop Archive; Improve performance; Small files in Hadoop;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering (JCSSE), 2014 11th International Joint Conference on
  • Conference_Location
    Chon Buri
  • Print_ISBN
    978-1-4799-5821-4
  • Type

    conf

  • DOI
    10.1109/JCSSE.2014.6841867
  • Filename
    6841867