• DocumentCode
    1600918
  • Title

    Casting out Demons: Sanitizing Training Data for Anomaly Sensors

  • Author

    Cretu, Gabriela F. ; Stavrou, Angelos ; Locasto, Michael E. ; Stolfo, Salvatore J. ; Keromytis, Angelos D.

  • Author_Institution
    Dept. of Comput. Sci., Columbia Univ., Columbia, NY
  • fYear
    2008
  • Firstpage
    81
  • Lastpage
    95
  • Abstract
    The efficacy of anomaly detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these "micro- models" to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as "attack-free" and "regular" as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site.
  • Keywords
    learning (artificial intelligence); security of data; anomaly detection sensor; collaborative approach; high-quality training data set; sanitization phase; voting scheme; Application software; Casting; Computer science; Computer security; Data privacy; Data security; Intrusion detection; Telecommunication traffic; Traffic control; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Security and Privacy, 2008. SP 2008. IEEE Symposium on
  • Conference_Location
    Oakland, CA
  • ISSN
    1081-6011
  • Print_ISBN
    978-0-7695-3168-7
  • Type

    conf

  • DOI
    10.1109/SP.2008.11
  • Filename
    4531146