• DocumentCode
    186381
  • Title

    Characterizing and subsetting big data workloads

  • Author

    Zhen Jia ; Jianfeng Zhan ; Lei Wang ; Rui Han ; Mckee, Sally A. ; Qiang Yang ; Chunjie Luo ; Jingwei Li

  • Author_Institution
    State Key Lab. Comput. Archit., Inst. of Comput. Technol., Beijing, China
  • fYear
    2014
  • fDate
    26-28 Oct. 2014
  • Firstpage
    191
  • Lastpage
    201
  • Abstract
    Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates these challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.
  • Keywords
    Big Data; digital simulation; pattern clustering; principal component analysis; Big Data benchmark suites; Big Data workload characterization; Big Data workload subsetting; BigDataBench; PCA; clustering technique; principal component analysis; simulation-based research methods; software stacks; Benchmark testing; Big data; Couplings; Measurement; Microarchitecture; Software; Sparks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Workload Characterization (IISWC), 2014 IEEE International Symposium on
  • Conference_Location
    Raleigh, NC
  • Print_ISBN
    978-1-4799-6452-9
  • Type

    conf

  • DOI
    10.1109/IISWC.2014.6983058
  • Filename
    6983058