• DocumentCode
    2864323
  • Title

    An empirical Bayes approach to detect anomalies in dynamic multidimensional arrays

  • Author

    Agarwal, Deepak

  • Author_Institution
    AT&T Labs-Res., Florham Park, NJ, USA
  • fYear
    2005
  • fDate
    27-30 Nov. 2005
  • Abstract
    We consider the problem of detecting anomalies in data that arise as multidimensional arrays with each dimension corresponding to the levels of a categorical variable. In typical data mining applications, the number of cells in such arrays is usually large. Our primary focus is detecting anomalies by comparing information at the current time to historical data. Naive approaches advocated in the process control literature do not work well in this scenario due to the multiple testing problems - performing multiple statistical tests on the same data produce excessive number of false positives. We use an empirical Bayes method which works by fitting a two component Gaussian mixture to deviations at current time. The approach is scalable to problems that involve monitoring massive number of cells and fast enough to be potentially useful in many streaming scenarios. We show the superiority of the method relative to a naive "per component error rate" procedure through simulation. A novel feature of our technique is the ability to suppress deviations that are merely the consequence of sharp changes in the marginal distributions. This research was motivated by the need to extract critical application information and business intelligence from the daily logs that accompany large-scale spoken dialog systems deployed by AT&T. We illustrate our method on one such system.
  • Keywords
    Bayes methods; Gaussian processes; data mining; Gaussian mixture; anomaly detection; business intelligence; data mining; dynamic multidimensional arrays; empirical Bayes approach; large-scale spoken dialog systems; Computational intelligence; Data mining; Databases; Large-scale systems; Monitoring; Multidimensional systems; Performance evaluation; Process control; Statistics; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, Fifth IEEE International Conference on
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2278-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2005.22
  • Filename
    1565658