DocumentCode
1925725
Title
Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?
Author
Loebman, Sarah ; Nunley, Dylan ; Kwon, YongChul ; Howe, Bill ; Balazinska, Magdalena ; Gardner, Jeffrey P.
Author_Institution
Univ. of Washington, Seattle, WA, USA
fYear
2009
fDate
Aug. 31 2009-Sept. 4 2009
Firstpage
1
Lastpage
10
Abstract
As the datasets used to fuel modern scientific discovery grow increasingly large, they become increasingly difficult to manage using conventional software. Parallel database management systems (DBMSs) and massive-scale data processing systems such as MapReduce hold promise to address this challenge. However, since these systems have not been expressly designed for scientific applications, their efficacy in this domain has not been thoroughly tested. In this paper, we study the performance of these engines in one specific domain: massive astrophysical simulations. We develop a use case that comprises five representative queries. We implement this use case in one distributed DBMS and in the Pig/Hadoop system. We compare the performance of the tools to each other and to hand-written IDL scripts. We find that certain representative analyses are easy to express in each engine´s high level language and both systems provide competitive performance and improved scalability relative to current IDL-based methods.
Keywords
data analysis; parallel databases; query processing; relational databases; software management; MapReduce program; distributed DBMS system; high level language; interactive data language; massive astrophysical simulations; massive-scale data processing systems; parallel database management systems; pig-hadoop system; queries; relational DBMS; software management; Application software; Data analysis; Data processing; Database systems; Engines; Fuels; High level languages; Performance analysis; Scalability; System testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Conference_Location
New Orleans, LA
ISSN
1552-5244
Print_ISBN
978-1-4244-5011-4
Electronic_ISBN
1552-5244
Type
conf
DOI
10.1109/CLUSTR.2009.5289149
Filename
5289149
Link To Document