DocumentCode
3063144
Title
Parallel Outlier Detection Using KD-Tree Based on MapReduce
Author
He, Qing ; Ma, Yunlong ; Wang, Qun ; Zhuang, Fuzhen ; Shi, Zhongzhi
Author_Institution
Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China
fYear
2011
fDate
Nov. 29 2011-Dec. 1 2011
Firstpage
75
Lastpage
80
Abstract
Distributed and Parallel algorithms have attracted a vast amount of interest and research in recent decades, to handle large-scale data set in real-world applications. In this paper, we focus on a parallel implementation of KD-Tree based outlier detection method to deal with large-scale data set. As one of the state-of-the-art outlier detection methods, KD-Tree based has been approved to be an effective algorithm. However, it still cannot process large-scale data set efficiently due to its serial implementation. Based on the current and powerful parallel programming framework -- MapReduce, we propose to implement the parallel KD-Tree based outlier detection algorithm (e.g., PKDTree for short). Experimental results demonstrate the efficiency of PKDTree according to the evaluation criterions of scale up, speedup and size up.
Keywords
data handling; data mining; parallel algorithms; parallel programming; tree data structures; KD-tree based outlier detection algorithm; MapReduce based KD-tree; PKDTree; distributed algorithm; large-scale data handling; parallel algorithm; parallel outlier detection; parallel programming framework; real-world application; serial implementation; state-of-the-art outlier detection method; Algorithm design and analysis; Arrays; Computers; Detection algorithms; Parallel algorithms; Runtime; Smoothing methods; Data mining; KD-Tree; MapReduce; Parallel Outlier Detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on
Conference_Location
Athens
Print_ISBN
978-1-4673-0090-2
Type
conf
DOI
10.1109/CloudCom.2011.20
Filename
6133129
Link To Document