DocumentCode
70390
Title
An Information-Theoretical Approach to High-Speed Flow Nature Identification
Author
Khakpour, Amir R. ; Liu, Alex X.
Author_Institution
Dept. of Comput. Sci. & Eng., Michigan State Univ., East Lansing, MI, USA
Volume
21
Issue
4
fYear
2013
fDate
Aug. 2013
Firstpage
1076
Lastpage
1089
Abstract
This paper concerns the fundamental problem of identifying the content nature of a flow-namely text, binary, or encrypted-for the first time. We propose Iustitia, a framework for identifying flow nature on the fly. The key observation behind Iustitia is that text flows have the lowest entropy and encrypted flows have the highest entropy, while the entropy of binary flows stands in between. We further extend Iustitia for the finer-grained classification of binary flows so that we can differentiate different types of binary flows (such as image, video, and executables) and even the file formats (such as JPEG and GIF for images, MPEG and AVI for videos) carried by binary flows. The basic idea of Iustitia is to classify flows using machine learning techniques where a feature is the entropy of every certain number of consecutive bytes. Our experimental results show that the classification can be done with high speed and high accuracy. On average, Iustitia can classify flows with 88.27% of accuracy using a buffer size of 1 K with a classification time of less than 10% of packet interarrival time for 91.2% of flows.
Keywords
cryptography; data compression; entropy; image classification; learning (artificial intelligence); text analysis; video coding; AVI; GIF; Iustitia; JPEG; MPEG; binary flow classification; binary flow entropy; encrypted flows; high-speed flow nature identification; information-theoretical approach; machine learning techniques; packet interarrival time; text flows; Accuracy; Cryptography; Entropy; Payloads; Support vector machines; Training; Vectors; Flow content analysis; flow identification;
fLanguage
English
Journal_Title
Networking, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1063-6692
Type
jour
DOI
10.1109/TNET.2012.2219591
Filename
6355645
Link To Document