• DocumentCode
    967571
  • Title

    HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems

  • Author

    Zhu, Yifeng ; Jiang, Hong ; Wang, Jun ; Xian, Feng

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Maine, Orono, ME
  • Volume
    19
  • Issue
    6
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    750
  • Lastpage
    763
  • Abstract
    An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. This paper presents a novel technique called Hierarchical Bloom Filter Arrays (HBA) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, the Bloom filter arrays with different levels of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, whereas the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Both arrays are replicated to all metadata servers to support fast local lookups. We evaluate HBA through extensive trace-driven simulations and implementation in Linux. Simulation results show our HBA design to be highly effective and efficient in improving the performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or superclusters) and with the amount of data in the petabyte scale or higher. Our implementation indicates that HBA can reduce the metadata operation time of a single-metadata-server architecture by a factor of up to 43.9 when the system is configured with 16 metadata servers.
  • Keywords
    Linux; filtering theory; meta data; storage management; HBA; Linux; decentralizing metadata management; distributed metadata management; filenames; hierarchical bloom filter arrays; large cluster-based storage systems; single-metadata-server architecture; Distributed file systems; Distributed systems; File Systems Management; Parallel systems; Storage Management;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2007.70788
  • Filename
    4378361