Title :
Automatic protocol feature word construction based on machine learning
Author :
Haifeng Li; Bin Zhang; Bo Shuai; Jian Wang; Chaojing Tang
Author_Institution :
School of Electronic Science and Engineering, National University of Defense Technology, Changsha, Hunan, China, 410073
Abstract :
Automatic protocol reverse engineering for application protocol is becoming more and more important for many applications such as application protocol analyzer, penetration testing, intrusion prevention and detection. Unfortunately, many techniques for extracting the protocol message format specifications of unknown applications often have some limitations for few priori information or the time-consuming problem. Protocol feature words are byte subsequences within traffic payload that could help distinguish application protocols. In this paper, a new approach is proposed for extracting the protocol message format specifications of unknown applications which is based on the Latent Dirichlet Allocation (LDA) model and Huffman Tree Support Vector Machine (HT-SVM). Firstly, the key words are extracted by utilizing the LDA model, which is a kind of machine learning in document library to extract the theme structure named topic. Secondly, the HT-SVM method is applied to constructing the feature words based on the above process. The proposed approach is implemented and evaluated to infer message format specifications of SMTP binary protocol. Experimental results show that the approach accurately parses and infers SMTP protocol with highly recall rate.
Keywords :
"Protocols","Artificial neural networks","Support vector machines"
Conference_Titel :
Progress in Informatics and Computing (PIC), 2015 IEEE International Conference on
Print_ISBN :
978-1-4673-8086-7
DOI :
10.1109/PIC.2015.7489816