• DocumentCode
    3507981
  • Title

    Accelerating network applications on X86-64 platforms

  • Author

    Xia, Gao ; Liu, Bin

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • fYear
    2010
  • fDate
    22-25 June 2010
  • Firstpage
    906
  • Lastpage
    912
  • Abstract
    The emerging multi-core platforms provide a high-performance, easy-to-develop and flexible way to implement high-speed network applications. For example, multi-core solutions for layer 7 protocol identification achieve 2 Gbps or higher processing speed. However, they occupy most of the processing cores in the system, leaving limited headroom for more complex manipulations, such as intrusion detection/prevention, anti-malware, data loss prevention, etc. Based on the deep understanding of the application bottlenecks and the optimization techniques of X86-64 platforms, we achieve the same or higher speed using only one core, saving more resources for further processing. In this paper, we make a deep system-wide profile and analyze the major hotspots of a typical network system on an Intel X86-64 platform, including a complete TCP/IP stack and a protocol identification engine by deep packet inspection (DPI). Profiling results and analysis show that network applications containing layer 2 to layer 7 processing are inherently memory and computation intensive. Then we propose optimization guidelines and techniques: 1) removing memory bottlenecks: combining independent irregular memory access to hide delay, using software-assistant cache prefetch to improve cache hit ratio, and mapping memory from kernel space to user space to reduce access overhead; 2) removing computation bottlenecks: using 64-bit registers and instructions to speed up the common computations, and using Streaming SIMD Extensions (SSE) instructions to accelerate special time-consuming tasks. Compared to the state-of-the-art multi-core implementations, our implementation on the X86-64 platform using only one core can deliver the same or higher processing speed of 7 Gbps with the average packet size of 501 bytes and 2 Gbps with the average packet size of 110 bytes.
  • Keywords
    Clocks; Delay; Kernel; Magnetic cores; Optimization; Pattern matching; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computers and Communications (ISCC), 2010 IEEE Symposium on
  • Conference_Location
    Riccione, Italy
  • ISSN
    1530-1346
  • Print_ISBN
    978-1-4244-7754-8
  • Type

    conf

  • DOI
    10.1109/ISCC.2010.5546496
  • Filename
    5546496