Title :
Rededup: Data Reallocation for Reading Performance Optimization in Deduplication System
Author :
Bin Lin ; Shanshan Li ; Xiangke Liao ; Jing Zhang
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Deduplication technology has been increasingly used to reduce the storage cost. In practice, it often causes additional on-disk fragments that impair the reading performance. To reduce the impact of fragments, traditional thought of defragmentation that reallocating files on-disk to achieve contiguous layout has been widely used in many operating systems. Unfortunately, file defragmentation is highly constrained by block sharing in deduplication system which makes it impossible for all files to have a perfect sequential on-disk layout. In this paper, we propose ReDedup which performs data reallocation in deduplication storage, with a goal to mitigate the impact of disk fragments. ReDedup is motivated by the observation of real world I/O workloads: non-uniform access frequency distribution of duplicated data. Leveraging this data skew, ReDedup can make a majority of I/O requests more sequential by eliminating the fragments of hot files and shifting them to the rarely read files. To achieve its objective, ReDedup dynamically estimates the access randomness and block sharing relationship, of individual files based on the history I/O activity, and then uses a greedy algorithm to selectively reallocate and place files sequentially on the disk. Our experimental evaluation of ReDedup prototype based on real-world datasets shows that ReDedup speeds up the reading performance by a factor of 28%-40%.
Keywords :
file organisation; greedy algorithms; input-output programs; operating systems (computers); ReDedup; block sharing; contiguous layout; data reallocation; deduplication system; deduplication technology; duplicated data; file defragmentation; greedy algorithm; nonuniform access frequency distribution; on-disk fragments; operating systems; perfect sequential on-disk layout; reading performance; reading performance optimization; real world IO workloads; Educational institutions; Electronic mail; History; Layout; Optimization; Postal services; Servers; Block access pattern; Data reallocation; Deduplication; Defragmentation;
Conference_Titel :
Advanced Cloud and Big Data (CBD), 2013 International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4799-3260-3
DOI :
10.1109/CBD.2013.28