DocumentCode
3462333
Title
Using MapReduce for High Energy Physics Data Analysis
Author
Glaser, Fabian ; Neukirchen, Helmut ; Rings, Thomas ; Grabowski, Jens
Author_Institution
Inst. of Comput. Sci., Univ. of Gottingen, Gottingen, Germany
fYear
2013
fDate
3-5 Dec. 2013
Firstpage
1271
Lastpage
1278
Abstract
At the Large Hadron Collider (LHC) High Energy Physics (HEP) experiment at CERN, 15 PB of raw data is recorded per year. As it was considered inconvenient to store, access and process this data using the traditional hardware and software tools, this data gets reduced to 10-200 TB per year. This paper investigates the applicability of the MapReduce paradigm for analyzing HEP data. In a case study, a sample HEP analysis that makes use of the HEP analysis framework ROOT has been re-implemented using the MapReduce implementation Apache Hadoop. In addition, a Hadoop input format has been developed that takes storage locality of the ROOT file format into account. This approach was evaluated in a cloud computing environment and compared to data analysis with the Parallel ROOT Facility (PROOF).
Keywords
data analysis; high energy physics instrumentation computing; Apache Hadoop input format; CERN; HEP data analysis framework ROOT; HEP experiment; LHC; Large Hadron Collider; MapReduce; PROOF; Parallel ROOT Facility; ROOT file format; cloud computing environment; hardware tools; high energy physics data analysis; software tools; storage locality; Cloud computing; Computer architecture; Data analysis; Distributed databases; Large Hadron Collider; Measurement; Physics; Cloud computing; Hadoop; High Energy Physics; Input format; MapReduce; PROOF; ROOT;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location
Sydney, NSW
Type
conf
DOI
10.1109/CSE.2013.189
Filename
6755371
Link To Document