DocumentCode :
611031
Title :
HDFS+: Concurrent Writes Improvements for HDFS
Author :
Kun Lu ; Dong Dai ; MingMing Sun
Author_Institution :
Dept. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2013
fDate :
13-16 May 2013
Firstpage :
182
Lastpage :
183
Abstract :
HDFS is a popular distributed file system which provides high scalability and throughput. It lacks built-in support for multi-source data generating, which arise naturally in many applications including log mining, data analysis etc. There needs a data collection step before analysis in basic HDFS environment because of many data are in local disk, such as log. We proposed a solution which can compose many existent files to a single file and it is suitable for concurrent writes by many data producers. Programs only have to implements data processing against one single file without a data collection step when data analysis. We implemented HDFS+ by modifying existent HDFS, and evaluated with applications including log analysis. Our results show great throughput improvements in data concurrent writes. HDFS+ vastly simplifies the data collecting steps in data analysis procedure.
Keywords :
concurrency control; data analysis; data mining; file organisation; HDFS environment; HDFS+; Hadoop Distributed File System; data analysis; data collection; data concurrent writes; data processing; distributed file system; log analysis; log mining; multisource data generation; throughput improvement; Data analysis; Dispersion; Distributed databases; Fault tolerance; Fault tolerant systems; File systems; Throughput; HDFS+; fragment; self-representable;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on
Conference_Location :
Delft
Print_ISBN :
978-1-4673-6465-2
Type :
conf
DOI :
10.1109/CCGrid.2013.41
Filename :
6546083
Link To Document :
بازگشت