• DocumentCode
    1388353
  • Title

    Anomaly Detection for Discrete Sequences: A Survey

  • Author

    Chandola, Varun ; Banerjee, Arindam ; Kumar, Vipin

  • Author_Institution
    Oak Ridge Nat. Lab., Oak Ridge, TN, USA
  • Volume
    24
  • Issue
    5
  • fYear
    2012
  • fDate
    5/1/2012 12:00:00 AM
  • Firstpage
    823
  • Lastpage
    839
  • Abstract
    This survey attempts to provide a comprehensive and structured overview of the existing research for the problem of detecting anomalies in discrete/symbolic sequences. The objective is to provide a global understanding of the sequence anomaly detection problem and how existing techniques relate to each other. The key contribution of this survey is the classification of the existing research into three distinct categories, based on the problem formulation that they are trying to solve. These problem formulations are: 1) identifying anomalous sequences with respect to a database of normal sequences; 2) identifying an anomalous subsequence within a long sequence; and 3) identifying a pattern in a sequence whose frequency of occurrence is anomalous. We show how each of these problem formulations is characteristically distinct from each other and discuss their relevance in various application domains. We review techniques from many disparate and disconnected application domains that address each of these formulations. Within each problem formulation, we group techniques into categories based on the nature of the underlying algorithm. For each category, we provide a basic anomaly detection technique, and show how the existing techniques are variants of the basic technique. This approach shows how different techniques within a category are related or different from each other. Our categorization reveals new variants and combinations that have not been investigated before for anomaly detection. We also provide a discussion of relative strengths and weaknesses of different techniques. We show how techniques developed for one problem formulation can be adapted to solve a different formulation, thereby providing several novel adaptations to solve the different problem formulations. We also highlight the applicability of the techniques that handle discrete sequences to other related areas such as online anomaly detection and time series anomaly detection.
  • Keywords
    database management systems; pattern classification; security of data; anomalous sequence identification; anomalous subsequence identification; application domain; discrete sequences; normal sequence database; occurrence frequency; online anomaly detection; sequence anomaly detection problem; sequence pattern identification; symbolic sequences; time series anomaly detection; Computers; Databases; Hidden Markov models; Operating systems; Postal services; Probabilistic logic; Training; Discrete sequences; anomaly detection.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2010.235
  • Filename
    5645624