• DocumentCode
    3063144
  • Title

    Parallel Outlier Detection Using KD-Tree Based on MapReduce

  • Author

    He, Qing ; Ma, Yunlong ; Wang, Qun ; Zhuang, Fuzhen ; Shi, Zhongzhi

  • Author_Institution
    Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China
  • fYear
    2011
  • fDate
    Nov. 29 2011-Dec. 1 2011
  • Firstpage
    75
  • Lastpage
    80
  • Abstract
    Distributed and Parallel algorithms have attracted a vast amount of interest and research in recent decades, to handle large-scale data set in real-world applications. In this paper, we focus on a parallel implementation of KD-Tree based outlier detection method to deal with large-scale data set. As one of the state-of-the-art outlier detection methods, KD-Tree based has been approved to be an effective algorithm. However, it still cannot process large-scale data set efficiently due to its serial implementation. Based on the current and powerful parallel programming framework -- MapReduce, we propose to implement the parallel KD-Tree based outlier detection algorithm (e.g., PKDTree for short). Experimental results demonstrate the efficiency of PKDTree according to the evaluation criterions of scale up, speedup and size up.
  • Keywords
    data handling; data mining; parallel algorithms; parallel programming; tree data structures; KD-tree based outlier detection algorithm; MapReduce based KD-tree; PKDTree; distributed algorithm; large-scale data handling; parallel algorithm; parallel outlier detection; parallel programming framework; real-world application; serial implementation; state-of-the-art outlier detection method; Algorithm design and analysis; Arrays; Computers; Detection algorithms; Parallel algorithms; Runtime; Smoothing methods; Data mining; KD-Tree; MapReduce; Parallel Outlier Detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on
  • Conference_Location
    Athens
  • Print_ISBN
    978-1-4673-0090-2
  • Type

    conf

  • DOI
    10.1109/CloudCom.2011.20
  • Filename
    6133129