DocumentCode :
3073637
Title :
Adaptable N-gram classification model for data leakage prevention
Author :
Alneyadi, Sultan ; Sithirasenan, E. ; Muthukkumarasamy, Vallipuram
Author_Institution :
Sch. of Inf. & Commun. Technol., Griffith Univ., Gold Coast, QLD, Australia
fYear :
2013
fDate :
16-18 Dec. 2013
Firstpage :
1
Lastpage :
8
Abstract :
Data confidentiality, integrity and availability are the ultimate goals for all information security mechanisms. However, most of these mechanisms do not proactively protect sensitive data; rather, they work under predefined policies and conditions to protect data in general. Few systems such as anomaly-based intrusion detection systems (IDS) might work independently without much administrative interference, but with no dedication to sensitivity of data. New mechanisms called data leakage prevention systems (DLP) have been developed to mitigate the risk of sensitive data leakage. Current DLPs mostly use data fingerprinting and exact and partial document matching to classify sensitive data. These approaches can have a serious limitation because they are susceptible to data misidentification. In this paper, we investigate the use of N-grams statistical analysis for data classification purposes. Our method is based on using N-grams frequency to classify documents under distinct categories. We are using simple taxicap geometry to compute the similarity between documents and existing categories. Moreover, we examine the effect of removing the most common words and connecting phrases on the overall classification. We are aiming to compensate the limitations in current data classification approaches used in the field of data leakage prevention. We show that our method is capable of correctly classifying up to 90.5% of the tested documents.
Keywords :
pattern classification; security of data; statistical analysis; N-grams statistical analysis; adaptable N-gram classification model; data availability; data classification; data confidentiality; data fingerprinting; data integrity; data leakage prevention; data misidentification; information security; partial document matching; taxicap geometry; Encryption; IP networks; Radio access networks; Servers; Virtual private networks; Data leakage prevention; N-gram profiles; N-grams;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing and Communication Systems (ICSPCS), 2013 7th International Conference on
Conference_Location :
Carrara, VIC
Type :
conf
DOI :
10.1109/ICSPCS.2013.6723919
Filename :
6723919
Link To Document :
بازگشت