• DocumentCode
    1925358
  • Title

    Improving metadata management for small files in HDFS

  • Author

    Mackey, Grant ; Sehrish, Saba ; Wang, Jun

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
  • fYear
    2009
  • fDate
    Aug. 31 2009-Sept. 4 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Scientific applications are adapting HDFS/MapReduce to perform large scale data analytics. One of the major challenges is that an overabundance of small files is common in these applications, and HDFS manages all its files through a single server, the Namenode. It is anticipated that small files can significantly impact the performance of Namenode. In this work we propose a mechanism to store small files in HDFS efficiently and improve the space utilization for metadata. Our scheme is based on the assumption that each client is assigned a quota in the file system, for both the space and number of files. In our approach, we utilize the compression method `harballing´, provided by Hadoop, to better utilize the HDFS. We provide for new job functionality to allow for in-job archival of directories and files so that running MapReduce programs may complete without being killed by the jobtracker due to quota policies. This approach leads to better functionality of metadata operations and more efficient usage of the HDFS. Our analysis results show that we can reduce the metadata footprint in main memory by a factor of 42.
  • Keywords
    cache storage; data analysis; meta data; HDFS-MapReduce program; Namenode; file storage; hadoop distributed file system; large datasets analysis; metadata management; single server; Application software; Availability; Computer architecture; Computer science; Data analysis; Engineering management; File servers; File systems; Large-scale systems; Performance analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
  • Conference_Location
    New Orleans, LA
  • ISSN
    1552-5244
  • Print_ISBN
    978-1-4244-5011-4
  • Electronic_ISBN
    1552-5244
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2009.5289133
  • Filename
    5289133