DocumentCode
186357
Title
The importance and characteristics of communication in high performance data analytics
Author
Anghel, Andreea ; Rodriguez, German ; Prisacari, Bogdan
Author_Institution
IBM Res., Zurich, Switzerland
fYear
2014
fDate
26-28 Oct. 2014
Firstpage
80
Lastpage
81
Abstract
Social networks and business analytics typically need to process vast amounts of data that are often modeled as graphs. The scale of the data that such applications have to handle requires large-scale distributed computing systems, together with scalable parallel algorithms, to efficiently process the graphs. Representative of the graph-based analytics class of applications is the Graph 500 benchmark (Murphy, et.al., 2010), which is designed to assess the performance of supercomputing systems by solving the Breadth-First Search (BFS) graph traversal problem. In this work, we analyze the network data motion of a Graph 500 MPI version of the graph traversal problem, using a large-scale high-performance computing system, i.e., the MareNostrum III supercomputer (http://www.bsc.es/marenostrum-support-services/mn3). We focus our analysis on the node-to-node communication and show that the application runtime is communication-bound, the communication making up as much as 80% of the execution time of each BFS iteration. We also show that the dominating communication pattern is an overall all-to-all exchange (every process communicates to every other process and roughly the same amount of data is exchanged between any two processes), thus providing preliminary guidance for future application or network design optimization efforts.
Keywords
data analysis; graph theory; parallel algorithms; parallel machines; tree searching; BFS graph traversal problem; Graph 500 MPI network data motion analysis; Graph 500 benchmark; MareNostrum III supercomputer; breadth-first search graph traversal problem; business analytics; graph-based analytics; high performance data analytics; large-scale distributed computing systems; large-scale high-performance computing system; network design optimization; node-to-node communication; scalable parallel algorithms; social networks; supercomputing systems; Benchmark testing; Computational modeling; Data analysis; Distributed databases; Electric breakdown; Instruments; Runtime;
fLanguage
English
Publisher
ieee
Conference_Titel
Workload Characterization (IISWC), 2014 IEEE International Symposium on
Conference_Location
Raleigh, NC
Print_ISBN
978-1-4799-6452-9
Type
conf
DOI
10.1109/IISWC.2014.6983044
Filename
6983044
Link To Document