Title : 
Word N-Gram Based Classification for Data Leakage Prevention
         
        
            Author : 
Alneyadi, Sultan ; Sithirasenan, E. ; Muthukkumarasamy, Vallipuram
         
        
            Author_Institution : 
Fac. of Sci., Environ., Eng. & Technol., Griffith Univ., Gold Coast, QLD, Australia
         
        
        
        
        
        
            Abstract : 
Revealing sensitive data to unauthorised personal is a serious problem to many organizations that can lead to devastating consequences. Traditionally, prevention of data leak was achieved through firewalls, VPNs and IDS, but without much consideration to sensitivity of the data. In recent years, new technologies such as data leakage prevention systems (DLPs) are developed, especially to either identify and protect sensitive data or monitor and detect sensitive data leakage. One of the most popular approaches used in DLPs is content analysis, where the content of exchanged documents, stored data or even network traffic is monitored for sensitive data. Contents of documents are examined using mainly text analysis and text clustering methods. Moreover, text analysis can be performed using methods such as pattern recognition, style variation and N-gram frequency. In this paper, we investigate the use of N-grams for data classification purposes. Our method is based on using the N-grams frequency to classify documents in order to detect and prevent leakage of sensitive data. We have studied the effectiveness of N-grams to measure the similarity between regular documents and existing classified documents.
         
        
            Keywords : 
organisational aspects; pattern classification; pattern clustering; security of data; text analysis; DLP; IDS; N-gram frequency; VPN; data classification purposes; data leakage prevention systems; documents classification; exchanged documents; firewalls; network traffic; organizations; pattern recognition; stored data; style variation; text analysis; text clustering methods; unauthorised personal; word N-gram based classification; Communication cables; Encryption; Government; Monitoring; Testing; Virtual private networks; Data leakage prevention; N-gram profiles; N-grams; matching distance;
         
        
        
        
            Conference_Titel : 
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
         
        
            Conference_Location : 
Melbourne, VIC
         
        
        
            DOI : 
10.1109/TrustCom.2013.71