A Hybrid Learning from Multi-behavior for Malicious Domain Detection on Enterprise Network

Author

Liang Shi;Derek Lin;Chunsheng Victor Fang;Yan Zhai

Author_Institution

Alibaba Inc., Hangzhou, China

fYear

2015

Firstpage

987

Lastpage

996

Abstract

Enterprises heavily rely on commercial security products to identify malicious domains for traffic blocking. However, new malicious domains emerge quickly and can remain undetected for extended time periods before security vendors identify them. This poses a serious threat to enterprise security. We believe that enterprises can take one step further than commercial tools/services by leveraging some information unavailable to those vendors -- its own data. By combining enterprises´ own perimeter traffic logs and the existing security intelligence, we can identify additional undetected malicious domains to complement the existing domain blacklists. In this paper, we propose a multi-behavioral hybrid learning approach that explores both the engineered feature space and the network graph. In particular, we combine both supervised learning on our proposed rich behavior feature set, as well as semi-supervised learning bootstrapped from connected component analysis that utilizes abundantly available unlabeled data. Our proposed method is highly scalable on large enterprise data. We demonstrate this novel hybrid learning framework is capable of identifying previously unknown malicious domains with low false positive rate.

Keywords

"Supervised learning","Malware","Radio frequency","Semisupervised learning","Inference algorithms","Data mining"

Publisher

ieee

Conference_Titel

Data Mining Workshop (ICDMW), 2015 IEEE International Conference on

Electronic_ISBN

2375-9259

Type

conf

DOI

10.1109/ICDMW.2015.38

Filename

7395774