• DocumentCode
    264982
  • Title

    A Strategy to Deal with Mass Small Files in HDFS

  • Author

    Shuo Zhang ; Li Miao ; Dafang Zhang ; Yuli Wang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Hunan Univ., Changsha, China
  • Volume
    1
  • fYear
    2014
  • fDate
    26-27 Aug. 2014
  • Firstpage
    331
  • Lastpage
    334
  • Abstract
    HDFS performs badly in storing and managing a great number of small files as a result of the great memory occupation of the single Namenode and massive seeks and hopping from datanode to datanode. Traditional solutions are only efficient for specific file size or file format. In this paper, we evaluate the performance of some different solutions such as Hbase and Avro. Then in order to compensate for the lack of their inefficiency for middle size small file, we implement a merging and prefetching mechanism. Finally for the purpose of reducing the influence of different file size distributions, we present a strategy of using different schemes for small files of different sizes. Through the experiments of performance comparison, it can be demonstrated that the strategy can improve the original HDFS´s writing and reading performance by about 70%.
  • Keywords
    storage management; HDFS; datanode; file format; file size distributions; mass small files; memory occupation; prefetching mechanism; single Namenode; File systems; Indexes; Information services; Memory management; Merging; Prefetching; Writing; Avro; HBASE; HDFS; small files; strategy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2014 Sixth International Conference on
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-4799-4956-4
  • Type

    conf

  • DOI
    10.1109/IHMSC.2014.87
  • Filename
    6917370