• DocumentCode
    180746
  • Title

    DedupT: Deduplication for tape systems

  • Author

    Gharaibeh, Ammar ; Constantinescu, C. ; Maohua Lu ; Routray, Ramani ; Sharma, Ashok ; Sarkar, Pradyut ; Pease, D. ; Ripeanu, Matei

  • Author_Institution
    Univ. of British Columbia, Vancouver, BC, Canada
  • fYear
    2014
  • fDate
    2-6 June 2014
  • Firstpage
    1
  • Lastpage
    11
  • Abstract
    Deduplication is a commonly-used technique on disk-based storage pools. However, deduplication has not been used for tape-based pools: tape characteristics, such as high mount and seek times combined with data fragmentation resulting from deduplication create a toxic combination that leads to unacceptably high retrieval times. This work proposes DedupT, a system that efficiently supports deduplication on tape pools. This paper (i) details the main challenges to enable efficient deduplication on tape libraries, (ii) presents a class of solutions based on graph-modeling of similarity between data items that enables efficient placement on tapes; and (iii) presents the design and evaluation of novel cross-tape and on-tape chunk placement algorithms that alleviate tape mount time overhead and reduce on-tape data fragmentation. Using 4.5 TB of real-world workloads, we show that DedupT retains at least 95% of the deduplication efficiency. We show that DedupT mitigates major retrieval time overheads, and, due to reading less data, is able to offer better restore performance compared to the case of restoring non-deduplicated data.
  • Keywords
    data handling; graph theory; magnetic tape storage; storage management; DedupT; cross-tape chunk placement algorithm; data item similarity; deduplication efficiency; disk-based storage pools; graph-modeling; on-tape chunk placement algorithm; on-tape data fragmentation reduction; retrieval time overhead; seek time; tape characteristics; tape libraries; tape mount time overhead; tape pool deduplication; tape systems; Algorithm design and analysis; Clustering algorithms; Computational modeling; Data models; Databases; Libraries; Servers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mass Storage Systems and Technologies (MSST), 2014 30th Symposium on
  • Conference_Location
    Santa Clara, CA
  • Type

    conf

  • DOI
    10.1109/MSST.2014.6855555
  • Filename
    6855555