Title :
Evaluation and Optimization of Breadth-First Search on NUMA Cluster
Author :
Cui, Zehan ; Chen, Licheng ; Chen, Mingyu ; Bao, Yungang ; Huang, Yongbing ; Lv, Huiwei
Author_Institution :
State Key Lab. of Comput. Archit., Inst. of Comput. Technol., Beijing, China
Abstract :
Graph is widely used in many areas. Breadth-First Search (BFS), a key subroutine for many graph analysis algorithms, has become the primary benchmark for Graph500 ranking. Due to the high communication cost of BFS, multi-socket nodes with large memory capacity (NUMA) are supposed to reduce network pressure. However, the longer latency to remote memory may cause problem if not treated well. In this work, we first demonstrate that simply spawning and binding one MPI process for each socket can achieve the best performance for MPI/OpenMP hybrid programmed BFS algorithm, resulting in 1.53X of performance on 16 nodes. Nevertheless, we notice that one MPI process per socket may exacerbate the communication cost. We propose to share some communication data structure among the processes inside the same node, to eliminate most of the intra-node communication. To fully utilize the network bandwidth, we make all the processes in a node to perform communication simultaneously. We further adjust the granularity of a key bitmap for better cache locality to speed up the computation. With all the optimizations for NUMA, communication and computation together, 2.44X of performance is achieved on 16 nodes, which is 39.2 Billion Traversed Edges per Second for an R-MAT graph of scale 32 (4 billion vertices and 64 billion edges).
Keywords :
application program interfaces; graph theory; memory architecture; optimisation; storage management; tree searching; BFS; Graph500 ranking; MPI process; NUMA cluster; OpenMP hybrid programming; R-MAT graph; breadth first search; communication data structure; graph analysis algorithm; intranode communication; memory capacity; multisocket node; network bandwidth; optimization; remote memory; Algorithm design and analysis; Bandwidth; Clustering algorithms; Computer architecture; Instruction sets; Optimization; Sockets; Allgather; BFS; Graph; MPI/OpenMP; NUMA;
Conference_Titel :
Cluster Computing (CLUSTER), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2422-9
DOI :
10.1109/CLUSTER.2012.29