Title :
An Efficient Reference-Based Approach to Outlier Detection in Large Datasets
Author :
Pei, Yaling ; Zaïane, Osmar R. ; Gao, Yong
Author_Institution :
Dept. of Comput. Sci., Alberta Univ., Edmonton, AB
Abstract :
A bottleneck to detecting distance and density based outliers is that a nearest-neighbor search is required for each of the data points, resulting in a quadratic number of pairwise distance evaluations. In this paper, we propose a new method that uses the relative degree of density with respect to a fixed set of reference points to approximate the degree of density defined in terms of nearest neighbors of a data point. The running time of our algorithm based on this approximation is 0(Rn log n) where n is the size of dataset and R is the number of reference points. Candidate outliers are ranked based on the outlier score assigned to each data point. Theoretical analysis and empirical studies show that our method is effective, efficient, and highly scalable to very large datasets.
Keywords :
approximation theory; computational complexity; pattern recognition; search problems; density based outliers; density degree approximation; distance detection; large datasets; nearest-neighbor search; outlier detection; pairwise distance evaluations; reference-based approach; Approximation algorithms; Art; Clustering algorithms; Computer science; Data mining; Distributed computing; Nearest neighbor searches; Spatial indexes; Statistical distributions; Tree data structures;
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2701-7
DOI :
10.1109/ICDM.2006.17