Title :
Minimal dataset for Network Intrusion Detection Systems via dimensionality reduction
Author :
Nziga, Jean-Pierre
Author_Institution :
Grad. Sch. of Comput. & Inf. Sci., Nova Southeastern Univ., Fort Lauderdale, FL, USA
Abstract :
Network Intrusion Detection Systems (NIDS) monitor internet traffic to detect malicious activities including but not limited to denial of service attacks, network accesses by unauthorized users, attempts to gain additional privileges and port scans. The amount of data that must be analyzed by NIDS is too large. Prior studies developed feature selection and feature extraction techniques to reduce the size of data. None has focused on finding exactly by how much the dataset should be reduced. Dimensionality reduction is a field in machine learning that consists on mapping high dimensional data into lower dimension while preserving important features of the original dataset. Dimensionality reduction techniques have been used to reduce the amount of data in applications such as speech signals, digital photographs, fMRI scans, DNA microarrays, Hyper spectral data. The purpose of this paper is to find the finite amount of data required for successful intrusion detection. This evaluation is necessary to improve the efficiency of NIDS in identifying existing attack patterns and recognizing new intrusion in real-time. Two dimensionality reduction techniques are used one linear technique (Principal Component Analysis) and one non-linear technique (Multidimensional Scaling). Data is then submitted to two classification algorithms J48 (C.45) and Naïve Bayes. This study was conducted using the KDD Cup 99 data. Experimental results show optimal performance with reduced datasets of 4 dimensions for J48 and 12 dimensions for Naïve Bayes.
Keywords :
Internet; computer network security; learning (artificial intelligence); principal component analysis; telecommunication traffic; C.45; Internet traffic; J48; NIDS; classification algorithm; denial of service attack; dimensionality reduction; feature extraction; feature selection; machine learning; multidimensional scaling; naive Bayes method; network intrusion detection system; nonlinear technique; principal component analysis; Accuracy; Algorithm design and analysis; Classification algorithms; Covariance matrix; Feature extraction; Intrusion detection; Principal component analysis; Dimensionality Reduction; Intrusion Detection; KDD; Multidimensional Scaling; Principal Component Analysis;
Conference_Titel :
Digital Information Management (ICDIM), 2011 Sixth International Conference on
Conference_Location :
Melbourn, QLD
Print_ISBN :
978-1-4577-1538-9
DOI :
10.1109/ICDIM.2011.6093368