DocumentCode :
2696756
Title :
Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU
Author :
Qu, Yun R. ; Zhang, Hao H. ; Shijie Zhou ; Prasanna, Viktor K.
Author_Institution :
Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2015
fDate :
7-8 May 2015
Firstpage :
87
Lastpage :
98
Abstract :
Due to the rapid growth of Internet, there is an increasing need for efficiently classifying packets with many header fields in large rule sets. For example, in Software Defined Networking (SDN), the OpenFlow table lookup can require 15 packet header fields to be examined. In this paper, we present several decomposition-based packet classification implementations with efficient optimization techniques. In the searching phase, packet headers are split or combined. In the merging phase, the partial searching results from all the fields are merged to generate the final result. We prototype our implementations on state-of-the-art Field Programmable Gate Array (FPGA), multi-core General Purpose Processor (GPP), and Graphics Processing Unit (GPU). On FPGA, we propose two optimization techniques to divide generic ranges; modular processing elements are constructed and concatenated into a systolic array. On multi-core GPP, we parallelize both the searching and merging phases using parallel program threads. On the GPU-accelerated platform, we minimize branch divergence and reduce the data communication overhead. Experimental results show that 500Million Packets Per Second (MPPS) throughput and 3μs latency can be achieved for 1:5K rule sets on FPGA. We achieve 14:7MPPS throughput and 30:5MPPS throughput for 32K rule sets on multi-core GPP and GPU-accelerated platforms, respectively. As a heterogeneous solution, our GPU-accelerated packet classier shows 2x speedup compared to the implementation using multi-core GPP only. Compared with prior works, our designs can match long packet headers against very complex rule sets.
Keywords :
field programmable gate arrays; graphics processing units; multiprocessing systems; optimisation; parallel programming; pattern classification; systolic arrays; FPGA; GPP; GPU-accelerated platform; Internet; OpenFlow table lookup; SDN; branch divergence minimization; data communication overhead reduction; decomposition-based packet classification; field programmable gate array; graphics processing unit; header fields; many-field packet classification optimization; merging phase; modular processing elements; multicore general purpose processor; optimization techniques; packet classification; parallel program threads; partial searching; searching phase; software defined networking; systolic array; time 3 mus; Arrays; Field programmable gate arrays; Graphics processing units; Merging; Optimization; Pipelines; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Architectures for Networking and Communications Systems (ANCS), 2015 ACM/IEEE Symposium on
Conference_Location :
Oakland, CA
Type :
conf
DOI :
10.1109/ANCS.2015.7110123
Filename :
7110123
Link To Document :
بازگشت