• DocumentCode
    81253
  • Title

    A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data

  • Author

    Qi Ding ; Kolaczyk, Eric D.

  • Author_Institution
    Dept. of Math. & Stat., Boston Univ., Boston, MA, USA
  • Volume
    59
  • Issue
    11
  • fYear
    2013
  • fDate
    Nov. 2013
  • Firstpage
    7419
  • Lastpage
    7433
  • Abstract
    Random projection is widely used as a method of dimension reduction. In recent years, its combination with standard techniques of regression and classification has been explored. Here, we examine its use for anomaly detection in high-dimensional settings, in conjunction with principal component analysis (PCA) and corresponding subspace detection methods. We assume a so-called spiked covariance model for the underlying data generation process and a Gaussian random projection. We adopt a hypothesis testing perspective of the anomaly detection problem, with the test statistic defined to be the magnitude of the residuals of a PCA analysis. Under the null hypothesis of no anomaly, we characterize the relative accuracy with which the mean and variance of the test statistic from compressed data approximate those of the corresponding test statistic from uncompressed data. Furthermore, under a suitable alternative hypothesis, we provide expressions that allow for a comparison of statistical power for detection. Finally, whereas these results correspond to the ideal setting in which the data covariance is known, we show that it is possible to obtain the same order of accuracy when the covariance of the compressed measurements is estimated using a sample covariance, as long as the number of measurements is of the same order of magnitude as the reduced dimensionality. We illustrate the practical impact of our results in the context of predicting volume anomalies in Internet traffic data.
  • Keywords
    Gaussian processes; covariance analysis; principal component analysis; regression analysis; security of data; telecommunication traffic; Gaussian random projection; Internet traffic data; anomaly detection problem; classification techniques; compressed PCA subspace method; compressed data; compressed measurements; data generation process; high-dimensional data; hypothesis testing perspective; null hypothesis; principal component analysis; regression techniques; relative accuracy; sample covariance; spiked covariance model; subspace detection methods; test statistic; uncompressed data; volume anomalies; Accuracy; Context; Internet; Monitoring; Principal component analysis; Standards; Testing; Anomaly detection; principal component analysis; random projection;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2013.2278017
  • Filename
    6578182