DocumentCode :
2961974
Title :
Fast parallel outlier detection for categorical datasets using MapReduce
Author :
Koufakou, Anna ; Secretan, Jimmy ; Reeder, John ; Cardona, Kelvin ; Georgiopoulos, Michael
Author_Institution :
Sch. of EECS, Univ. of Central Florida, Orlando, FL
fYear :
2008
fDate :
1-8 June 2008
Firstpage :
3298
Lastpage :
3304
Abstract :
Outlier detection has received considerable attention in many applications, such as detecting network attacks or credit card fraud The massive datasets currently available for mining in some of these outlier detection applications require large parallel systems, and consequently parallelizable outlier detection methods. Most existing outlier detection methods assume that all of the attributes of a dataset are numerical, usually have a quadratic time complexity with respect to the number of points in the dataset, and quite often they require multiple dataset scans. In this paper, we propose a fast parallel outlier detection strategy based on the Attribute Value Frequency (AVF) approach, a high-speed, scalable outlier detection method for categorical data that is inherently easy to parallelize. Our proposed solution, MR-AVF, is based on the MapReduce paradigm for parallel programming, which offers load balancing and fault tolerance. MR-AVF is particularly simple to develop and it is shown to be highly scalable with respect to the number of cluster nodes.
Keywords :
computational complexity; fault tolerant computing; parallel programming; resource allocation; secondary ion mass spectra; security of data; MapReduce paradigm; attribute value frequency; categorical datasets; credit card fraud; fast parallel outlier detection; fault tolerance; load balancing; network attacks detection; parallel programming; parallel systems; quadratic time complexity; scalable outlier detection; Breast cancer; Frequency; Network servers; Neural networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
Conference_Location :
Hong Kong
ISSN :
1098-7576
Print_ISBN :
978-1-4244-1820-6
Electronic_ISBN :
1098-7576
Type :
conf
DOI :
10.1109/IJCNN.2008.4634266
Filename :
4634266
Link To Document :
بازگشت