DocumentCode :
1626902
Title :
Efficient prefetching technique for storage of heterogeneous small files in Hadoop Distributed File System Federation
Author :
Aishwarya, K. ; Arvind Ram, A. ; Sreevatson, M.C. ; Babu, Chitra ; Prabavathy, B.
fYear :
2013
Firstpage :
523
Lastpage :
530
Abstract :
Hadoop Distributed File System Federation [5] is used to store and manage large files. This has been used in a university scenario to store various categories of files such as PDFs, audio, video, presentation and image files. However, HDFS Federation suffers performance penalty while storing a large number of small files. Also, scaling the namenodes in HDFS Federation does not solve the small files problem [7] but only delays the metadata accumulation. One approach to handle this problem was implemented in BlueSky [1], one of the most revalent e-learning resources in China. However, this system does not handle files from heterogeneous users and the prefetching mechanism implemented in this system takes into account only the locality of reference and does not consider file access patterns. The objective of this paper is to address the above mentioned shortcomings by developing an efficient approach to handle files from heterogeneous users and to devise an efficient prefetching algorithm based on file access patterns. The file access patterns are stored and updated in a priority heap. Heterogeneous users can upload their files and complete transparency is maintained in grouping small files into a large file. This approach of merging several small files into a large file reduces the memory footprint in Federated HDFS. In addition to the existing features, this paper also provides options to modify and delete the files stored by users in Federated HDFS. Performance of original HDFS Federation and the proposed system are benchmarked with a set of 100,000 small files. The experimental results show that the memory usage was reduced by 36% from original HDFS Federation. File read time has been brought down by 94% (with prefetching based on files access patterns) compared to the proposed system without prefetching and 92% compared to prefetching based on the locality of reference.
Keywords :
distributed databases; meta data; network operating systems; storage management; BlueSky; China; HDFS federation; Hadoop distributed file system federation; e-Iearning resources; file access patterns; file access prefetching; file read time; heterogeneous small file storage; heterogeneous users; memory footprint; metadata accumulation; prefetching mechanism; prefetching technique; Delays; Educational institutions; Heart beat; Indexes; Lead; Merging; Prefetching; HDFS Federation; files access pattern; metadata; prefetching; small files problem;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computing (ICoAC), 2013 Fifth International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4799-3447-8
Type :
conf
DOI :
10.1109/ICoAC.2013.6922006
Filename :
6922006
Link To Document :
بازگشت