Title :
Parallel Outlier Detection Using KD-Tree Based on MapReduce
Author :
He, Qing ; Ma, Yunlong ; Wang, Qun ; Zhuang, Fuzhen ; Shi, Zhongzhi
Author_Institution :
Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China
fDate :
Nov. 29 2011-Dec. 1 2011
Abstract :
Distributed and Parallel algorithms have attracted a vast amount of interest and research in recent decades, to handle large-scale data set in real-world applications. In this paper, we focus on a parallel implementation of KD-Tree based outlier detection method to deal with large-scale data set. As one of the state-of-the-art outlier detection methods, KD-Tree based has been approved to be an effective algorithm. However, it still cannot process large-scale data set efficiently due to its serial implementation. Based on the current and powerful parallel programming framework -- MapReduce, we propose to implement the parallel KD-Tree based outlier detection algorithm (e.g., PKDTree for short). Experimental results demonstrate the efficiency of PKDTree according to the evaluation criterions of scale up, speedup and size up.
Keywords :
data handling; data mining; parallel algorithms; parallel programming; tree data structures; KD-tree based outlier detection algorithm; MapReduce based KD-tree; PKDTree; distributed algorithm; large-scale data handling; parallel algorithm; parallel outlier detection; parallel programming framework; real-world application; serial implementation; state-of-the-art outlier detection method; Algorithm design and analysis; Arrays; Computers; Detection algorithms; Parallel algorithms; Runtime; Smoothing methods; Data mining; KD-Tree; MapReduce; Parallel Outlier Detection;
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4673-0090-2
DOI :
10.1109/CloudCom.2011.20