• DocumentCode
    3462333
  • Title

    Using MapReduce for High Energy Physics Data Analysis

  • Author

    Glaser, Fabian ; Neukirchen, Helmut ; Rings, Thomas ; Grabowski, Jens

  • Author_Institution
    Inst. of Comput. Sci., Univ. of Gottingen, Gottingen, Germany
  • fYear
    2013
  • fDate
    3-5 Dec. 2013
  • Firstpage
    1271
  • Lastpage
    1278
  • Abstract
    At the Large Hadron Collider (LHC) High Energy Physics (HEP) experiment at CERN, 15 PB of raw data is recorded per year. As it was considered inconvenient to store, access and process this data using the traditional hardware and software tools, this data gets reduced to 10-200 TB per year. This paper investigates the applicability of the MapReduce paradigm for analyzing HEP data. In a case study, a sample HEP analysis that makes use of the HEP analysis framework ROOT has been re-implemented using the MapReduce implementation Apache Hadoop. In addition, a Hadoop input format has been developed that takes storage locality of the ROOT file format into account. This approach was evaluated in a cloud computing environment and compared to data analysis with the Parallel ROOT Facility (PROOF).
  • Keywords
    data analysis; high energy physics instrumentation computing; Apache Hadoop input format; CERN; HEP data analysis framework ROOT; HEP experiment; LHC; Large Hadron Collider; MapReduce; PROOF; Parallel ROOT Facility; ROOT file format; cloud computing environment; hardware tools; high energy physics data analysis; software tools; storage locality; Cloud computing; Computer architecture; Data analysis; Distributed databases; Large Hadron Collider; Measurement; Physics; Cloud computing; Hadoop; High Energy Physics; Input format; MapReduce; PROOF; ROOT;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
  • Conference_Location
    Sydney, NSW
  • Type

    conf

  • DOI
    10.1109/CSE.2013.189
  • Filename
    6755371