مرکز منطقه ای اطلاع رساني علوم و فناوري - Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU

DocumentCode :

2696756

Title :

Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU

Author :

Qu, Yun R. ; Zhang, Hao H. ; Shijie Zhou ; Prasanna, Viktor K.

Author_Institution :

Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA

fYear :

2015

fDate :

7-8 May 2015

Firstpage :

Lastpage :

Abstract :

Due to the rapid growth of Internet, there is an increasing need for efficiently classifying packets with many header fields in large rule sets. For example, in Software Defined Networking (SDN), the OpenFlow table lookup can require 15 packet header fields to be examined. In this paper, we present several decomposition-based packet classification implementations with efficient optimization techniques. In the searching phase, packet headers are split or combined. In the merging phase, the partial searching results from all the fields are merged to generate the final result. We prototype our implementations on state-of-the-art Field Programmable Gate Array (FPGA), multi-core General Purpose Processor (GPP), and Graphics Processing Unit (GPU). On FPGA, we propose two optimization techniques to divide generic ranges; modular processing elements are constructed and concatenated into a systolic array. On multi-core GPP, we parallelize both the searching and merging phases using parallel program threads. On the GPU-accelerated platform, we minimize branch divergence and reduce the data communication overhead. Experimental results show that 500Million Packets Per Second (MPPS) throughput and 3μs latency can be achieved for 1:5K rule sets on FPGA. We achieve 14:7MPPS throughput and 30:5MPPS throughput for 32K rule sets on multi-core GPP and GPU-accelerated platforms, respectively. As a heterogeneous solution, our GPU-accelerated packet classier shows 2x speedup compared to the implementation using multi-core GPP only. Compared with prior works, our designs can match long packet headers against very complex rule sets.

Keywords :

field programmable gate arrays; graphics processing units; multiprocessing systems; optimisation; parallel programming; pattern classification; systolic arrays; FPGA; GPP; GPU-accelerated platform; Internet; OpenFlow table lookup; SDN; branch divergence minimization; data communication overhead reduction; decomposition-based packet classification; field programmable gate array; graphics processing unit; header fields; many-field packet classification optimization; merging phase; modular processing elements; multicore general purpose processor; optimization techniques; packet classification; parallel program threads; partial searching; searching phase; software defined networking; systolic array; time 3 mus; Arrays; Field programmable gate arrays; Graphics processing units; Merging; Optimization; Pipelines; Throughput;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Architectures for Networking and Communications Systems (ANCS), 2015 ACM/IEEE Symposium on

Conference_Location :

Oakland, CA

Type :

conf

DOI :

10.1109/ANCS.2015.7110123

Filename :

7110123

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2696756