Accelerating network applications on X86-64 platforms

Author

Xia, Gao ; Liu, Bin

Author_Institution

Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China

fYear

2010

fDate

22-25 June 2010

Firstpage

906

Lastpage

912

Abstract

The emerging multi-core platforms provide a high-performance, easy-to-develop and flexible way to implement high-speed network applications. For example, multi-core solutions for layer 7 protocol identification achieve 2 Gbps or higher processing speed. However, they occupy most of the processing cores in the system, leaving limited headroom for more complex manipulations, such as intrusion detection/prevention, anti-malware, data loss prevention, etc. Based on the deep understanding of the application bottlenecks and the optimization techniques of X86-64 platforms, we achieve the same or higher speed using only one core, saving more resources for further processing. In this paper, we make a deep system-wide profile and analyze the major hotspots of a typical network system on an Intel X86-64 platform, including a complete TCP/IP stack and a protocol identification engine by deep packet inspection (DPI). Profiling results and analysis show that network applications containing layer 2 to layer 7 processing are inherently memory and computation intensive. Then we propose optimization guidelines and techniques: 1) removing memory bottlenecks: combining independent irregular memory access to hide delay, using software-assistant cache prefetch to improve cache hit ratio, and mapping memory from kernel space to user space to reduce access overhead; 2) removing computation bottlenecks: using 64-bit registers and instructions to speed up the common computations, and using Streaming SIMD Extensions (SSE) instructions to accelerate special time-consuming tasks. Compared to the state-of-the-art multi-core implementations, our implementation on the X86-64 platform using only one core can deliver the same or higher processing speed of 7 Gbps with the average packet size of 501 bytes and 2 Gbps with the average packet size of 110 bytes.

Keywords

Clocks; Delay; Kernel; Magnetic cores; Optimization; Pattern matching; Protocols;

fLanguage

English

Publisher

ieee

Conference_Titel

Computers and Communications (ISCC), 2010 IEEE Symposium on

Conference_Location

Riccione, Italy

ISSN

1530-1346

Print_ISBN

978-1-4244-7754-8

Type

conf

DOI

10.1109/ISCC.2010.5546496

Filename

5546496