DocumentCode :
2485354
Title :
A Scalable and Efficient Outlier Detection Strategy for Categorical Data
Author :
Koufakou, A. ; Ortiz, E.G. ; Georgiopoulos, M. ; Anagnostopoulos, G.C. ; Reynolds, K.M.
Author_Institution :
Univ. of Central Florida, Orlando
Volume :
2
fYear :
2007
fDate :
29-31 Oct. 2007
Firstpage :
210
Lastpage :
217
Abstract :
Outlier detection has received significant attention in many applications, such as detecting credit card fraud or network intrusions. Most existing research focuses on numerical datasets, and cannot directly apply to categorical sets where there is little sense in calculating distances among data points. Furthermore, a number of outlier detection methods require quadratic time with respect to the dataset size and usually multiple dataset scans. These characteristics are undesirable for large datasets, potentially scattered over multiple distributed sites. In this paper, we introduce Attribute Value Frequency (A VF), a fast and scalable outlier detection strategy for categorical data. A VF scales linearly with the number of data points and attributes, and relies on a single data scan. AVF is compared with a list of representative outlier detection approaches that have not been contrasted against each other. Our proposed solution is experimentally shown to be significantly faster, and as effective in discovering outliers.
Keywords :
data mining; attribute value frequency; categorical dataset; credit card fraud; data mining; network intrusion; outlier detection; Artificial intelligence; Cleaning; Clustering algorithms; Credit cards; Diseases; Explosions; Frequency; Intrusion detection; Scalability; Scattering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
Conference_Location :
Patras
ISSN :
1082-3409
Print_ISBN :
978-0-7695-3015-4
Type :
conf
DOI :
10.1109/ICTAI.2007.125
Filename :
4410382
Link To Document :
بازگشت