Title :
Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU
Author :
Shenshen Liang ; Ying Liu ; Cheng Wang ; Liheng Jian
Author_Institution :
Chinese Acad. of Sci., Grad. Univ., Beijing, China
Abstract :
Recent developments in Graphics Processing Units (GPUs) have enabled inexpensive high performance computing for general-purpose applications. Due to GPU´s tremendous computing capability, it has emerged as the co-processor of CPU to achieve a high overall throughput. CUDA programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. K-nearest neighbor (KNN) is a widely used classification technique and has significant applications in various domains, especially in text classification. The computational-intensive nature of KNN requires a high performance implementation. In this paper, we propose CUKNN, a CUDA-based parallel implementation of KNN. It launches two CUDA kernels, distance calculation kernel and selecting kernel. In the distance calculation kernel, a great number of concurrent CUDA threads are issued, where each thread performs the calculation between the query object and a reference object; in the selecting kernel, threads in a block find the local-k nearest neighbors of the query object concurrently, and then a thread is invoked to find the global-k nearest neighbors out of the queues of local-k neighbors. Various CUDA optimization techniques are applied to maximize the utilization of GPU. We evaluate our implementation by using synthetic dataseis and a real physical simulation dataset. The experimental results demonstrate that CUKNN outperforms the serial KNN on an HP xw8600 workstation significantly, achieving up to 46.7IX speedup on the synthetic dataseis and 42.49X on the physical simulation dataset including I/O cost. It also shows good scalability when varying the number of dimensions of the reference dataset, the number of objects in the reference dataset, and the number of objects in the query dataset.
Keywords :
C language; computer graphic equipment; coprocessors; optimisation; pattern classification; C language; CUDA optimization techniques; CUDA-enabled GPU; KNN; classification technique; coprocessor; graphics processing units; parallel implementation; parallel k-nearest neighbor algorithm; query dataset; text classification; Classification algorithms; Computational modeling; Graphics processing unit; Instruction sets; Kernel; Nearest neighbor searches; Programming;
Conference_Titel :
Web Society (SWS), 2010 IEEE 2nd Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6356-5
DOI :
10.1109/SWS.2010.5607480