DocumentCode :
186362
Title :
Towards realistic benchmarking for cloud file systems: Early experiences
Author :
Zujie Ren ; Weisong Shi ; Jian Wan
Author_Institution :
Sch. of Comput. Sci., Hangzhou Dianzi Univ., Hangzhou, China
fYear :
2014
fDate :
26-28 Oct. 2014
Firstpage :
88
Lastpage :
98
Abstract :
Over the past few years, cloud file systems such as Google File System (GFS) and Hadoop Distributed File System (HDFS) have received a lot of research efforts to optimize their designs and implementations. A common issue for these efforts is performance benchmarking. Unfortunately, many system researchers and engineers face challenges on making a benchmark that reflects real-life workload cases, due to the complexity of cloud file systems and vagueness of I/O workload characteristics. They could easily make incorrect assumptions about their systems and workloads, leading to the benchmark results differing from the fact. As the preliminary step for designing a realistic benchmark, we make an effort to explore the characteristics of data and I/O workload in a production environment. We collected a two-week I/O workload trace from a 2,500-node production cluster, which is one of the largest cloud platforms in Asia. This cloud platform provides two public cloud services: data storage service (DSS) and data processing service (DPS). We analyze the commonalities and individualities between both cloud services in multiple perspectives, including the request arrival pattern, request size, data population and so on. Eight key observations are highlighted from the comprehensive study, including the arrival rate of requests follows a Lognormal distribution rather than a Poisson distribution, request arrival presents multiple periodicities, cloud file systems fit partly-open model rather than purely open model or closed model. Based on the comparative analysis results, we derive several interesting implications on guiding system researchers and engineers to build a realistic benchmark on their own systems. Finally, we discuss several open issues and challenges raised on benchmarking cloud file systems.
Keywords :
cloud computing; log normal distribution; storage management; Asia; DPS; DSS; GFS; Google File System; HDFS; Hadoop Distributed File System; I-O workload characteristics; cloud file systems; data population; data processing service; data storage service; lognormal distribution; partly-open model; performance benchmarking; production cluster; public cloud services; request arrival pattern; request size; time 2 week; Benchmark testing; Cloud computing; Decision support systems; File systems; Memory; Production; Servers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Workload Characterization (IISWC), 2014 IEEE International Symposium on
Conference_Location :
Raleigh, NC
Print_ISBN :
978-1-4799-6452-9
Type :
conf
DOI :
10.1109/IISWC.2014.6983048
Filename :
6983048
Link To Document :
بازگشت