DocumentCode :
3200401
Title :
Efficient and Simplified Parallel Graph Processing over CPU and MIC
Author :
Linchuan Chen ; Xin Huo ; Bin Ren ; Jain, Surabhi ; Agrawal, Gagan
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear :
2015
fDate :
25-29 May 2015
Firstpage :
819
Lastpage :
828
Abstract :
Intel Xeon Phi (MIC architecture) is a relatively new accelerator chip, which combines large-scale shared memory parallelism with wide SIMD lanes. Mapping applications on a node with such an architecture to achieve high parallel efficiency´s a major challenge. In this paper, we focus on developing system for heterogeneous graph processing, which is able to utilize both a many-core Xeon Phi and a multi-core CPU ozone node. We propose a simple programming API with unintuitive interface for expressing SIMD parallelism. We develop efficient techniques for supporting our high-level API, focusing on exploiting wide SIMD lanes, massive number of cores, and partitioning of the work across CPU and accelerator, while handling the irregularity of graph applications. The components of our runtime system include a condensed static memory buffer, which supports efficient message insertion and SIMD message reduction while keeping memory requirements low, and specifically formic, a pipelining scheme for efficient message generation by avoiding frequent locking operations. Besides, a hybrid graph partitioning module is able to effectively partition the workload between the CPU and the MIC, ensuring balanced workload and low communication overhead. The main observations from our experimental evaluation using five popular applications are: formic executions, pipelining scheme is up to 3.36x faster than naive approach using locking based message generation, and the speedup over OpenMP ranges from 1.17 to 4.15. Heterogeneous-MIC execution achieves a speedup of up to 1.41 over the better of the CPU-only and MIC-only executions.
Keywords :
application program interfaces; graph theory; multiprocessing systems; parallel architectures; pipeline processing; shared memory systems; Intel Xeon Phi architecture; MIC architecture; OpenMP; SIMD lanes; SIMD message reduction; accelerator chip; balanced workload; condensed static memory buffer; heterogeneous graph processing; high-level API; hybrid graph partitioning module; large-scale shared memory parallelism; locking based message generation; low communication overhead; many-core Xeon Phi; message insertion; multicore CPU ozone node; node mapping; parallel graph processing; pipelining scheme; simple programming API; Indexes; Message systems; Microwave integrated circuits; Multicore processing; Parallel processing; Programming; Runtime; CPU; Graph Processing; Intel MIC; Programming Model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
ISSN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2015.88
Filename :
7161568
Link To Document :
بازگشت