DocumentCode :
866790
Title :
Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences
Author :
Silva, Jorge ; Willett, Rebecca
Author_Institution :
Duke Univ., Durham, NC
Volume :
31
Issue :
3
fYear :
2009
fDate :
3/1/2009 12:00:00 AM
Firstpage :
563
Lastpage :
569
Abstract :
This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational expectation-maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the false discovery rate. The algorithm has O(np) computational complexity, where n is the number of training observations and p is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where p > 75,000, and it is shown that it can outperform other state-of-the-art methods.
Keywords :
computational complexity; data analysis; expectation-maximisation algorithm; graph theory; spatial data structures; unsupervised learning; variational techniques; Enron email database; computational complexity; data hypergraph representation; dimensionality reduction; false discovery rate; feature selection; high-dimensional multivariate co-occurrence data analysis; hypergraph-based anomaly detection; unsupervised learning; variational expectation-maximization algorithm; Anomaly detection; Co-occurrence data; False Discovery Rate; Unsupervised learning; Variational methods; Algorithms; Artificial Intelligence; Computer Simulation; Models, Theoretical; Pattern Recognition, Automated;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2008.232
Filename :
4626961
Link To Document :
بازگشت