مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient and Simplified Parallel Graph Processing over CPU and MIC

DocumentCode :

3200401

Title :

Efficient and Simplified Parallel Graph Processing over CPU and MIC

Author :

Linchuan Chen ; Xin Huo ; Bin Ren ; Jain, Surabhi ; Agrawal, Gagan

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2015

fDate :

25-29 May 2015

Firstpage :

819

Lastpage :

828

Abstract :

Intel Xeon Phi (MIC architecture) is a relatively new accelerator chip, which combines large-scale shared memory parallelism with wide SIMD lanes. Mapping applications on a node with such an architecture to achieve high parallel efficiency´s a major challenge. In this paper, we focus on developing system for heterogeneous graph processing, which is able to utilize both a many-core Xeon Phi and a multi-core CPU ozone node. We propose a simple programming API with unintuitive interface for expressing SIMD parallelism. We develop efficient techniques for supporting our high-level API, focusing on exploiting wide SIMD lanes, massive number of cores, and partitioning of the work across CPU and accelerator, while handling the irregularity of graph applications. The components of our runtime system include a condensed static memory buffer, which supports efficient message insertion and SIMD message reduction while keeping memory requirements low, and specifically formic, a pipelining scheme for efficient message generation by avoiding frequent locking operations. Besides, a hybrid graph partitioning module is able to effectively partition the workload between the CPU and the MIC, ensuring balanced workload and low communication overhead. The main observations from our experimental evaluation using five popular applications are: formic executions, pipelining scheme is up to 3.36x faster than naive approach using locking based message generation, and the speedup over OpenMP ranges from 1.17 to 4.15. Heterogeneous-MIC execution achieves a speedup of up to 1.41 over the better of the CPU-only and MIC-only executions.

Keywords :

application program interfaces; graph theory; multiprocessing systems; parallel architectures; pipeline processing; shared memory systems; Intel Xeon Phi architecture; MIC architecture; OpenMP; SIMD lanes; SIMD message reduction; accelerator chip; balanced workload; condensed static memory buffer; heterogeneous graph processing; high-level API; hybrid graph partitioning module; large-scale shared memory parallelism; locking based message generation; low communication overhead; many-core Xeon Phi; message insertion; multicore CPU ozone node; node mapping; parallel graph processing; pipelining scheme; simple programming API; Indexes; Message systems; Microwave integrated circuits; Multicore processing; Parallel processing; Programming; Runtime; CPU; Graph Processing; Intel MIC; Programming Model;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International

Conference_Location :

Hyderabad

ISSN :

1530-2075

Type :

conf

DOI :

10.1109/IPDPS.2015.88

Filename :

7161568

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3200401