• DocumentCode
    1631946
  • Title

    An Efficient Indexing Mechanism for Data Deduplication

  • Author

    Thwel, Tin Thein ; Thein, Ni Lar

  • Author_Institution
    Univ. of Comput. Studies, Yangon, Myanmar
  • fYear
    2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    At present, there is a vast amount of duplicated data or redundant data in storage systems. Data de-duplication can eliminate multiple copies of the same file and duplicated segments or chunks of data within those files. In these days, therefore, data de-duplication becomes an interesting field in storage environments especially in persistent data storage for data centers. Many data deduplication mechanisms have been proposed for efficient data deduplication in order to safe storage space. Current issue for data deduplication is to avoid full-chunk indexing to identify the incoming data is new, which is time consuming process. In this paper, we propose an efficient indexing mechanism for this problem using the advantage of B+ tree properties. In our proposed system, we will first separate the file into variable-length chunks using Two Thresholds Two Divisors chunking algorithm. ChunkIDs are then obtained by applying hash function to the chunks. The resulted ChunkIDs are used to build as indexing keys in B+ tree like index structure. So the searching time for the duplicate file chunks reduces from O (n) to O (log n), which can avoid the risk of full chunk indexing.
  • Keywords
    data handling; indexing; tree data structures; B+ tree property; ChunklD; data deduplication; hash function; indexing mechanism; multiple copies elimination; redundant data; storage system; two thresholds two divisors chunking algorithm; variable length chunk; Cryptography; Indexing; Information retrieval; Law; Legal factors; Mechanical factors; Memory; Redundancy; Secure storage; Tin; b+ tree; data deduplication; indexing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Current Trends in Information Technology (CTIT), 2009 International Conference on the
  • Conference_Location
    Dubai
  • Print_ISBN
    978-1-4244-5754-0
  • Electronic_ISBN
    978-1-4244-5756-4
  • Type

    conf

  • DOI
    10.1109/CTIT.2009.5423123
  • Filename
    5423123