DocumentCode :
1631946
Title :
An Efficient Indexing Mechanism for Data Deduplication
Author :
Thwel, Tin Thein ; Thein, Ni Lar
Author_Institution :
Univ. of Comput. Studies, Yangon, Myanmar
fYear :
2009
Firstpage :
1
Lastpage :
5
Abstract :
At present, there is a vast amount of duplicated data or redundant data in storage systems. Data de-duplication can eliminate multiple copies of the same file and duplicated segments or chunks of data within those files. In these days, therefore, data de-duplication becomes an interesting field in storage environments especially in persistent data storage for data centers. Many data deduplication mechanisms have been proposed for efficient data deduplication in order to safe storage space. Current issue for data deduplication is to avoid full-chunk indexing to identify the incoming data is new, which is time consuming process. In this paper, we propose an efficient indexing mechanism for this problem using the advantage of B+ tree properties. In our proposed system, we will first separate the file into variable-length chunks using Two Thresholds Two Divisors chunking algorithm. ChunkIDs are then obtained by applying hash function to the chunks. The resulted ChunkIDs are used to build as indexing keys in B+ tree like index structure. So the searching time for the duplicate file chunks reduces from O (n) to O (log n), which can avoid the risk of full chunk indexing.
Keywords :
data handling; indexing; tree data structures; B+ tree property; ChunklD; data deduplication; hash function; indexing mechanism; multiple copies elimination; redundant data; storage system; two thresholds two divisors chunking algorithm; variable length chunk; Cryptography; Indexing; Information retrieval; Law; Legal factors; Mechanical factors; Memory; Redundancy; Secure storage; Tin; b+ tree; data deduplication; indexing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Current Trends in Information Technology (CTIT), 2009 International Conference on the
Conference_Location :
Dubai
Print_ISBN :
978-1-4244-5754-0
Electronic_ISBN :
978-1-4244-5756-4
Type :
conf
DOI :
10.1109/CTIT.2009.5423123
Filename :
5423123
Link To Document :
بازگشت