DocumentCode :
2571576
Title :
Data preprocessing for distance-based unsupervised Intrusion Detection
Author :
Said, Dina ; Stirling, Leia ; Federolf, P. ; Barker, Ken
Author_Institution :
Dept. of Comput. Sci., Univ. of Calgary, Calgary, AB, Canada
fYear :
2011
fDate :
19-21 July 2011
Firstpage :
181
Lastpage :
188
Abstract :
Since Intrusion Detection Systems (IDSs) operate in real-time, they should be light-weighted to detect intrusions as fast as possible. Distance-based Outlier Detection (DBOD) is one of the most widely-used techniques for detecting outliers due to its simplicity and efficiency. Additionally, DBOD is an unsupervised approach which overcomes the problem of the lack of training datasets with known intrusions. However, since IDSs usually have high-dimensional datasets, using DBOD becomes subject to the curse of the dimensionality problem. Furthermore, intrusion datasets should be normalized before calculating pair-wise distance between observations. The purpose of this research is conduct a comparative study among different normalization methods in conjunction with a well-known feature extraction technique; Principle Component Analysis (PCA). Therefore, the efficiency of these methods as data preprocessing techniques can be investigated when applying DBOD to detect intrusions. Experiments were performed using two kinds of distance metrics; Euclidean distance and Mahalanobis distance. We further examined the PCA using 7 threshold values to indicate the number of Principle components to consider according to their total contribution in the variability of features. These approaches have been evaluated using the KDD Cup 1999 intrusion detection (KDD-Cup) dataset. The main purpose of this study is to find the best attribute normalization method along with the correct threshold value for PCA so that a fast unsupervised IDS can discover intrusions effectively. The results recommended using the Log normalization method combined the Euclidean distance while performing PCA.
Keywords :
principal component analysis; security of data; unsupervised learning; DBOD; Euclidean distance; IDS; Mahalanobis distance; PCA; data preprocessing; dimensionality problem; distance based outlier detection; distance based unsupervised intrusion detection; feature extraction technique; principle component analysis; Equations; Euclidean distance; Feature extraction; Intrusion detection; Principal component analysis; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Privacy, Security and Trust (PST), 2011 Ninth Annual International Conference on
Conference_Location :
Montreal, QC
Print_ISBN :
978-1-4577-0582-3
Type :
conf
DOI :
10.1109/PST.2011.5971981
Filename :
5971981
Link To Document :
بازگشت