• DocumentCode
    30306
  • Title

    Discovering the Top-k Unexplained Sequences in Time-Stamped Observation Data

  • Author

    Albanese, Massimiliano ; Molinaro, Cristian ; Persia, Fabio ; Picariello, Antonio ; Subrahmanian, V.S.

  • Author_Institution
    Dept. of Appl. Inf. Technol., George Mason Univ., Fairfax, VA, USA
  • Volume
    26
  • Issue
    3
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    577
  • Lastpage
    594
  • Abstract
    There are numerous applications where we wish to discover unexpected activities in a sequence of time-stamped observation data-for instance, we may want to detect inexplicable events in transactions at a website or in video of an airport tarmac. In this paper, we start with a known set A of activities (both innocuous and dangerous) that we wish to monitor. However, in addition, we wish to identify “unexplained” subsequences in an observation sequence that are poorly explained (e.g., because they may contain occurrences of activities that have never been seen or anticipated before, i.e., they are not in A). We formally define the probability that a sequence of observations is unexplained (totally or partially) w.r.t. A. We develop efficient algorithms to identify the top-k Totally and partially unexplained sequences w.r.t. A. These algorithms leverage theorems that enable us to speed up the search for totally/partially unexplained sequences. We describe experiments using real-world video and cyber-security data sets showing that our approach works well in practice in terms of both running time and accuracy.
  • Keywords
    data mining; probability; cyber-security data set; observation sequence; partially unexplained sequences; probability; real-world video data set; time-stamped observation data; top-k totally sequences; top-k unexplained sequence discovery; unexpected activities discovery; Airports; Algorithm design and analysis; Computer security; Correlation; Hidden Markov models; Monitoring; Stochastic processes; Knowledge representation formalisms and methods; artificial intelligence; computing methodologies; knowledge base management; knowledge representation formalisms and methods;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2013.33
  • Filename
    6506840