Title :
Parallel algorithms for distance-based and density-based outliers
Author :
Lozano, Elio ; Acufia, E.
Author_Institution :
Dept. of Math., Puerto Rico Univ., Mayaguez, Puerto Rico
Abstract :
An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. Outlier detection has many applications, such as data cleaning, fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that exhibit a behavior that is very different from most of the individuals of the dataset. In this paper we design two parallel algorithms, the first one is for finding out distance-based outliers based on nested loops along with randomization and the use of a pruning rule. The second parallel algorithm is for detecting density-based local outliers. In both cases data parallelism is used. We show that both algorithms reach near linear speedup. Our algorithms are tested on four real-world datasets coming from the Machine Learning Database Repository at the UCI.
Keywords :
data analysis; parallel algorithms; data cleaning; data parallelism; density-based local outliers; distance-based outliers; fraud detection; nested loops; network intrusion; outlier detection; parallel algorithms; pruning rule; Algorithm design and analysis; Cleaning; Data mining; Databases; Intrusion detection; Machine learning algorithms; Mathematics; Nearest neighbor searches; Parallel algorithms; Testing;
Conference_Titel :
Data Mining, Fifth IEEE International Conference on
Print_ISBN :
0-7695-2278-5
DOI :
10.1109/ICDM.2005.116