مرکز منطقه ای اطلاع رساني علوم و فناوري - Evaluating Hadoop for Data-Intensive Scientific Operations

DocumentCode :

2784423

Title :

Evaluating Hadoop for Data-Intensive Scientific Operations

Author :

Fadika, Zacharia ; Govindaraju, M. ; Canon, Richard ; Ramakrishnan, Lavanya

fYear :

2012

fDate :

24-29 June 2012

Firstpage :

Lastpage :

Abstract :

Emerging sensor networks, more capable instruments, and ever increasing simulation scales are generating data at a rate that exceeds our ability to effectively manage, curate, analyze, and share it. Data-intensive computing is expected to revolutionize the next-generation software stack. Hadoop, an open source implementation of the MapReduce model provides a way for large data volumes to be seamlessly processed through use of large commodity computers. The inherent parallelization, synchronization and fault-tolerance the model offers, makes it ideal for highly-parallel data-intensive applications. MapReduce and Hadoop have traditionally been used for web data processing and only recently been used for scientific applications. There is a limited understanding on the performance characteristics that scientific data intensive applications can obtain from MapReduce and Hadoop. Thus, it is important to evaluate Hadoop specifically for data-intensive scientific operations -- filter, merge and reorder-- to understand its various design considerations and performance trade-offs. In this paper, we evaluate Hadoop for these data operations in the context of High Performance Computing (HPC) environments to understand the impact of the file system, network and programming modes on performance.

Keywords :

data handling; file organisation; natural sciences computing; parallel processing; public domain software; HPC; Hadoop evaluation; MapReduce model; Web data processing; data-intensive computing; data-intensive scientific operations; fault-tolerance; file system; filter operation; high performance computing environments; highly-parallel data-intensive applications; merge operation; network modes; next-generation software stack; open source implementation; parallelization; programming modes; reorder operation; synchronization; Cloud computing; Conferences;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on

Conference_Location :

Honolulu, HI

ISSN :

2159-6182

Print_ISBN :

978-1-4673-2892-0

Type :

conf

DOI :

10.1109/CLOUD.2012.118

Filename :

6253490

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2784423