DocumentCode
5160
Title
MUCH: Multithreaded Content-Based File Chunking
Author
Youjip Won ; Kyeongyeol Lim ; Jaehong Min
Author_Institution
Div. of Comput. Sci. & Eng., Hanyang Univ., Seoul, South Korea
Volume
64
Issue
5
fYear
2015
fDate
May 1 2015
Firstpage
1375
Lastpage
1388
Abstract
In this work, we developed a novel multithreaded variable size chunking method, MUCH, which exploits the multicore architecture of the modern microprocessors. The legacy single threaded variable size chunking method leaves much to be desired in terms of effectively exploiting the bandwidth of the state of the art storage devices. MUCH guarantees chunking invariability: The result of chunking does not change regardless of the degree of multithreading or the segment size. This is achieved by inter and intra-segment coalescing at the master thread and Dual Mode Chunking at the client thread. We developed an elaborate performance model to determine the optimal multithreading degree and the segment size. MUCH is implemented in the prototype deduplication system. By fully exploiting the available CPU cores (quad-core), we achieved up to ×4 increase in the chunking performance (MByte/sec). MUCH successfully addresses the performance issues of file chunking which is one of the performance bottlenecks in modern deduplication systems by parallelizing the file chunking operation while guaranteeing Chunking Invariability.
Keywords
multi-threading; multiprocessing systems; parallel architectures; storage management; CPU cores; MUCH; art storage devices; chunking invariability; chunking performance; client thread; dual mode chunking; inter-segment coalescing; intra-segment coalescing; legacy single threaded variable size chunking method; master thread; modern microprocessors; multicore architecture; multithreaded content-based file chunking; multithreaded variable size chunking method; optimal multithreading degree; prototype deduplication system; quad-core; segment size multithreading; Bandwidth; Central Processing Unit; Hardware; Instruction sets; Multithreading; Redundancy; Upper bound; Content-based chunking; deduplication; multithread;
fLanguage
English
Journal_Title
Computers, IEEE Transactions on
Publisher
ieee
ISSN
0018-9340
Type
jour
DOI
10.1109/TC.2014.2322600
Filename
6815680
Link To Document