Title :
Behavior-based network traffic synthesis
Author :
Song, Yingbo ; Stolfo, Salvatore J. ; Jebara, Tony
Author_Institution :
Dept. of Comput. Sci., Columbia Univ., New York, NY, USA
Abstract :
Modern network security research has demonstrated a clear necessity for open sharing of traffic datasets between organizations - a need that has so far been superseded by the challenges of removing sensitive content from the data beforehand. Network Data Anonymization is an emerging field dedicated to solving this problem, with a main focus on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks; particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Network traffic synthesis (or simulation) is a closely related complimentary approach which, while more difficult to execute accurately, has the potential for far greater flexibility. This paper leverages the statistical-idiosyncrasies of network behavior to augment anonymization and traffic-synthesis techniques through machine-learning models specifically designed to capture host-level behavior. We present the design of a system that can automatically learn models for network host behavior across time, then use these models to replicate the original behavior, to interpolate across gaps in the original traffic, and demonstrate how to generate new diverse behaviors. Further, we measure the similarity of the synthesized data to the original, providing us with a quantifiable estimate of data fidelity.
Keywords :
computer network security; data privacy; learning (artificial intelligence); statistical analysis; telecommunication security; telecommunication traffic; IP addresses; behavior-based network traffic synthesis; data fidelity estimation; host-level behavior capture; machine-learning models; network data anonymization; network security research; pierce privacy; sensitive content removal; statistical artifacts; statistical-idiosyncrasies; traffic datasets; usernames; Data models; Electronic mail; Histograms; MIMICs; Mathematical model; Protocols; Security;
Conference_Titel :
Technologies for Homeland Security (HST), 2011 IEEE International Conference on
Conference_Location :
Waltham, MA
Print_ISBN :
978-1-4577-1375-0
DOI :
10.1109/THS.2011.6107893