Title :
High-Performance Traffic Classification on GPU
Author :
Shijie Zhou ; Nittoor, P.R. ; Prasanna, V.K.
Author_Institution :
Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Abstract :
Traffic classification is an essential task in network management. Recently, there has been a new trend in exploring Graphics Processing Unit (GPU) for network applications. These applications typically do not perform floating point operations and obtaining speedup can be challenging. In this paper, we design a high-performance traffic classifier based on an alternate representation of the C4.5 decision-tree algorithm and implement it using Compute Unified Device Architecture (CUDA). To remedy the unbalanced nature of the decision-trees arising in traffic classification, we convert the C4.5 decision-tree into a set of completely balanced range-trees. Classification is performed by searching the range-trees and merging the search results. We optimize our design by storing the range-trees using compact arrays without explicit pointers in shared memory. By exploiting thread level parallelism, we develop throughput-optimized as well as latency-optimized designs. Experimental results show that for a typical decision-tree containing 128 leaf nodes and 6 features, our design achieves a throughput of over 1600 million classifications per second (MCPS). Compared with the state-of the-art multi-core implementation, our design demonstrates 16x improvement with respect to throughput. We also demonstrate similar performance improvements on a variety of decision-trees with respect to number of leaf nodes, structure of the tree and number of features.
Keywords :
computer network management; decision trees; floating point arithmetic; graphics processing units; pattern classification; telecommunication traffic; tree searching; C4.5 decision-tree algorithm; CUDA; Compute Unified Device Architecture; GPU; compact arrays; completely balanced range-trees; explicit pointers; floating point operations; graphics processing unit; high-performance traffic classification; latency-optimized design; leaf nodes; multicore implementation comparison; network applications; network management; performance improvement; range-tree searching; search result merging; shared memory; thread level parallelism; throughput-optimized design; tree structure; Accuracy; Classification algorithms; Feature extraction; Graphics processing units; Instruction sets; Ports (Computers); Throughput; CUDA; GPU; High-Performance; Traffic Classification;
Conference_Titel :
Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on
Conference_Location :
Jussieu
DOI :
10.1109/SBAC-PAD.2014.48