• DocumentCode
    153706
  • Title

    Automatic Dataset Labelling and Feature Selection for Intrusion Detection Systems

  • Author

    Aparicio-Navarro, Francisco J. ; Kyriakopoulos, Konstantinos G. ; Parish, David J.

  • Author_Institution
    Sch. of Electron., Electr. & Syst. Eng., Loughborough Univ., Loughborough, UK
  • fYear
    2014
  • fDate
    6-8 Oct. 2014
  • Firstpage
    46
  • Lastpage
    51
  • Abstract
    Correctly labelled datasets are commonly required. Three particular scenarios are highlighted, which showcase this need. When using supervised Intrusion Detection Systems (IDSs), these systems need labelled datasets to be trained. Also, the real nature of the analysed datasets must be known when evaluating the efficiency of the IDSs when detecting intrusions. Another scenario is the use of feature selection that works only if the processed datasets are labelled. In normal conditions, collecting labelled datasets from real networks is impossible. Currently, datasets are mainly labelled by implementing off-line forensic analysis, which is impractical because it does not allow real-time implementation. We have developed a novel approach to automatically generate labelled network traffic datasets using an unsupervised anomaly based IDS. The resulting labelled datasets are subsets of the original unlabelled datasets. The labelled dataset is then processed using a Genetic Algorithm (GA) based approach, which performs the task of feature selection. The GA has been implemented to automatically provide the set of metrics that generate the most appropriate intrusion detection results.
  • Keywords
    digital forensics; feature selection; genetic algorithms; GA based approach; automatic dataset labelling; feature selection; genetic algorithm; labelled network traffic dataset; offline forensic analysis; real-time implementation; supervised intrusion detection systems; unlabelled dataset; unsupervised anomaly based IDS; Biological cells; Feature extraction; Genetic algorithms; Intrusion detection; Labeling; Measurement; Telecommunication traffic; Automatic Labelling; Feature Selection; Genetic Algorithm; Network Traffic Labelling; Unsupervised Anomaly IDS;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Military Communications Conference (MILCOM), 2014 IEEE
  • Conference_Location
    Baltimore, MD
  • Type

    conf

  • DOI
    10.1109/MILCOM.2014.17
  • Filename
    6956736