• DocumentCode
    584601
  • Title

    Improving the Efficiency of Storing for Small Files in HDFS

  • Author

    Zhang, Yang ; Liu, Dan

  • Author_Institution
    Res. Inst. of Electron. Sci. & Technol., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
  • fYear
    2012
  • fDate
    11-13 Aug. 2012
  • Firstpage
    2239
  • Lastpage
    2242
  • Abstract
    HDFS (Hadoop Distributed File System) is the popular file system. But HDFS has inefficient issue with small files. Traditional method has the drawback of high resource consumption and low efficiency performance. In order to resolve this problem, this paper proposes a novel approach for small files process, which works as an engine independent with the HDFS. This engine can reduce the overhead of HDFS effectively. It uses Reactor multiplexed IO to build the server and uses non-blocking IO to merge and read small files. And the engine has a cache of small files that can make the reading efficiently. This paper presents the small files processing strategy for files efficient merger, which builds the file index and uses boundary file block filling mechanism to accomplish files separation and files retrieval. At last the experimental results show that the novel approach has improved the efficiency of storing and processing massive small files in HDFS.
  • Keywords
    file organisation; indexing; information retrieval; input-output programs; records management; HDFS; Hadoop Distributed File System; boundary file block filling mechanism; file index; files efficient merger; files retrieval; files separation; massive small file processing; massive small file storing; nonblocking IO; reactor multiplexed IO; small files process; Conferences; Corporate acquisitions; Engines; File systems; Indexes; Memory management; Servers; Hadoop Distributed FileSystem; file merger mechanism; small file storage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Service System (CSSS), 2012 International Conference on
  • Conference_Location
    Nanjing
  • Print_ISBN
    978-1-4673-0721-5
  • Type

    conf

  • DOI
    10.1109/CSSS.2012.556
  • Filename
    6394874