• DocumentCode
    1843030
  • Title

    Active learning with neural networks for intrusion detection

  • Author

    Seliya, Naeem ; Khoshgoftaar, Taghi M.

  • Author_Institution
    Comput. & Inf. Sci., Univ. of Michigan-Dearborn, Dearborn, MI, USA
  • fYear
    2010
  • fDate
    4-6 Aug. 2010
  • Firstpage
    49
  • Lastpage
    54
  • Abstract
    This paper presents a neural-network-based active learning procedure for computer network intrusion detection. Applying data mining and machine learning techniques to network intrusion detection often faces the problem of very large training dataset size. For example, the training dataset commonly used for the DARPA KDD-1999 offline intrusion detection project contained approximately five hundred thousand (10% sample of the original five million) observations, which were used to build intrusion detection classification models. The practical problems associated with such a large dataset include very long model training times, redundant information, and increased complexity in understanding the domain-specific data. We demonstrate that a simple active learning procedure can dramatically reduce the size of the training data, without significantly sacrificing the classification accuracy of the intrusion detection model. A case study of the DARPA KDD-1999 intrusion detection project is used in our work. The network traffic instances are classified into one of two categories - normal and attack. A comparison of the actively trained neural network model with a C4.5 decision tree indicated that the actively learned model had better generalization accuracy. In addition, the training data classification performance of the actively learned model was comparable to that of the C4.5 decision tree.
  • Keywords
    data mining; decision trees; learning (artificial intelligence); neural nets; security of data; C4.5 decision tree; DARPA KDD-1999; active learning; computer network intrusion detection; data mining; machine learning techniques; network traffic instances; neural networks; very large training dataset size; Artificial neural networks; Biological system modeling; Data models; Intrusion detection; Machine learning; Training; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2010 IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4244-8097-5
  • Type

    conf

  • DOI
    10.1109/IRI.2010.5558967
  • Filename
    5558967