DocumentCode :
3577020
Title :
Bivariate probability-based anomaly detection
Author :
Hua Lou ; Ye Zhu
Author_Institution :
Changzhou Coll. of Inf. Technol., Changzhou, China
fYear :
2014
Firstpage :
1
Lastpage :
6
Abstract :
Statistical techniques play a crucial role in anomaly detection. Although they usually are simple and can be trained unsupervised, they face three challenges: parametric techniques usually rely on the assumption that the data meet a special distribution; existing Histogram-based techniques only take account of individual attribute, which cannot capture the interactions between different attributes; some statistical techniques still need labeled data for training or validation. In order to overcome these drawbacks, this paper proposes a different statistic method to justify the data instances. The proposed method, named Bivariate Probability based Anomaly Score (BPAS) algorithm, builds an ensemble of Bivariate Probability (BP) models for a given data set, and each model calculates the probability distribution for the combination of intervals from two attributes. The anomalies will be detected when they occur in these low probability combination. The empirical evaluation presents that BPAS works favorably to LOF, ORCA and ¡Forest on different types of real data sets in terms of AUC. Its performance is relative stable when key parameters changes. BPAS also performs well in categorical data sets and the data sets that contain normal instances only. Furthermore, it has a linear time complexity of 0(n), which is much lower than distance-based and density-based methods. Thus BPAS has potential ability to become an efficient anomaly detector for high volume and high dimensional databases.
Keywords :
computational complexity; database management systems; security of data; statistical analysis; statistical distributions; AUC; BPAS algorithm; LOF; ORCA; anomaly detector; bivariate probability based anomaly score algorithm; bivariate probability-based anomaly detection; categorical data sets; data instances; density-based methods; distance-based methods; high dimensional databases; histogram-based techniques; iForest; labeled data; linear time complexity; parametric techniques; probability distribution; statistic method; statistical techniques; Data models; Detectors; Histograms; Probability; Runtime; Satellites; Training; Anomaly detection; BPAS; Bivariate Probability; iForest;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Behavior, Economic and Social Computing (BESC), 2014 International Conference on
Type :
conf
DOI :
10.1109/BESC.2014.7059512
Filename :
7059512
Link To Document :
بازگشت