DocumentCode
1631946
Title
An Efficient Indexing Mechanism for Data Deduplication
Author
Thwel, Tin Thein ; Thein, Ni Lar
Author_Institution
Univ. of Comput. Studies, Yangon, Myanmar
fYear
2009
Firstpage
1
Lastpage
5
Abstract
At present, there is a vast amount of duplicated data or redundant data in storage systems. Data de-duplication can eliminate multiple copies of the same file and duplicated segments or chunks of data within those files. In these days, therefore, data de-duplication becomes an interesting field in storage environments especially in persistent data storage for data centers. Many data deduplication mechanisms have been proposed for efficient data deduplication in order to safe storage space. Current issue for data deduplication is to avoid full-chunk indexing to identify the incoming data is new, which is time consuming process. In this paper, we propose an efficient indexing mechanism for this problem using the advantage of B+ tree properties. In our proposed system, we will first separate the file into variable-length chunks using Two Thresholds Two Divisors chunking algorithm. ChunkIDs are then obtained by applying hash function to the chunks. The resulted ChunkIDs are used to build as indexing keys in B+ tree like index structure. So the searching time for the duplicate file chunks reduces from O (n) to O (log n), which can avoid the risk of full chunk indexing.
Keywords
data handling; indexing; tree data structures; B+ tree property; ChunklD; data deduplication; hash function; indexing mechanism; multiple copies elimination; redundant data; storage system; two thresholds two divisors chunking algorithm; variable length chunk; Cryptography; Indexing; Information retrieval; Law; Legal factors; Mechanical factors; Memory; Redundancy; Secure storage; Tin; b+ tree; data deduplication; indexing;
fLanguage
English
Publisher
ieee
Conference_Titel
Current Trends in Information Technology (CTIT), 2009 International Conference on the
Conference_Location
Dubai
Print_ISBN
978-1-4244-5754-0
Electronic_ISBN
978-1-4244-5756-4
Type
conf
DOI
10.1109/CTIT.2009.5423123
Filename
5423123
Link To Document