Title :
Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences
Author :
Silva, Jorge ; Willett, Rebecca
Author_Institution :
Duke Univ., Durham, NC
fDate :
3/1/2009 12:00:00 AM
Abstract :
This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational expectation-maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the false discovery rate. The algorithm has O(np) computational complexity, where n is the number of training observations and p is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where p > 75,000, and it is shown that it can outperform other state-of-the-art methods.
Keywords :
computational complexity; data analysis; expectation-maximisation algorithm; graph theory; spatial data structures; unsupervised learning; variational techniques; Enron email database; computational complexity; data hypergraph representation; dimensionality reduction; false discovery rate; feature selection; high-dimensional multivariate co-occurrence data analysis; hypergraph-based anomaly detection; unsupervised learning; variational expectation-maximization algorithm; Anomaly detection; Co-occurrence data; False Discovery Rate; Unsupervised learning; Variational methods; Algorithms; Artificial Intelligence; Computer Simulation; Models, Theoretical; Pattern Recognition, Automated;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2008.232