Minimal dataset for Network Intrusion Detection Systems via dimensionality reduction

Author

Nziga, Jean-Pierre

Author_Institution

Grad. Sch. of Comput. & Inf. Sci., Nova Southeastern Univ., Fort Lauderdale, FL, USA

fYear

2011

fDate

26-28 Sept. 2011

Firstpage

168

Lastpage

173

Abstract

Network Intrusion Detection Systems (NIDS) monitor internet traffic to detect malicious activities including but not limited to denial of service attacks, network accesses by unauthorized users, attempts to gain additional privileges and port scans. The amount of data that must be analyzed by NIDS is too large. Prior studies developed feature selection and feature extraction techniques to reduce the size of data. None has focused on finding exactly by how much the dataset should be reduced. Dimensionality reduction is a field in machine learning that consists on mapping high dimensional data into lower dimension while preserving important features of the original dataset. Dimensionality reduction techniques have been used to reduce the amount of data in applications such as speech signals, digital photographs, fMRI scans, DNA microarrays, Hyper spectral data. The purpose of this paper is to find the finite amount of data required for successful intrusion detection. This evaluation is necessary to improve the efficiency of NIDS in identifying existing attack patterns and recognizing new intrusion in real-time. Two dimensionality reduction techniques are used one linear technique (Principal Component Analysis) and one non-linear technique (Multidimensional Scaling). Data is then submitted to two classification algorithms J48 (C.45) and Naïve Bayes. This study was conducted using the KDD Cup 99 data. Experimental results show optimal performance with reduced datasets of 4 dimensions for J48 and 12 dimensions for Naïve Bayes.

Keywords

Internet; computer network security; learning (artificial intelligence); principal component analysis; telecommunication traffic; C.45; Internet traffic; J48; NIDS; classification algorithm; denial of service attack; dimensionality reduction; feature extraction; feature selection; machine learning; multidimensional scaling; naive Bayes method; network intrusion detection system; nonlinear technique; principal component analysis; Accuracy; Algorithm design and analysis; Classification algorithms; Covariance matrix; Feature extraction; Intrusion detection; Principal component analysis; Dimensionality Reduction; Intrusion Detection; KDD; Multidimensional Scaling; Principal Component Analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Digital Information Management (ICDIM), 2011 Sixth International Conference on

Conference_Location

Melbourn, QLD

ISSN

Pending

Print_ISBN

978-1-4577-1538-9

Type

conf

DOI

10.1109/ICDIM.2011.6093368

Filename

6093368