Title :
Using Multiclass Machine Learning Methods to Classify Malicious Behaviors Aimed at Web Systems
Author :
Goseva-Popstojanova, Katerina ; Anastasovski, Goce ; Pantev, R.
Author_Institution :
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
Abstract :
The number of vulnerabilities and attacks on Web systems show an increasing trend and tend to dominate on the Internet. Furthermore, due to their popularity and users ability to create content, Web 2.0 applications have become particularly attractive targets. These trends clearly illustrate the need for better understanding of malicious cyber activities based on both qualitative and quantitative analysis. This paper is focused on multiclass classification of malicious Web activities using three supervised machine learning methods: J48, PART, and Support Vector Machines (SVM). The empirical analysis is based on data collected in duration of nine months by a high interaction honey pot consisting of a three-tier Web system, which included Web 2.0 applications (i.e., a blog and wiki). Our results show that supervised learning methods can be used to efficiently distinguish among multiple vulnerability scan and attack classes, with high recall and precision values for all but several very small classes. For our dataset, decision tree based methods J48 and PART perform slightly better than SVM in terms of overall accuracy and weighted recall. Additionally, J48 and PART require less than half of the features (i.e., session attributes) used by SVM, as well as they execute much faster. Therefore, they seem to be clear methods of choice.
Keywords :
Internet; learning (artificial intelligence); pattern classification; security of data; support vector machines; Internet; J48; PART; SVM; Web 2.0 applications; attack classes; high-interaction honeypot; malicious behavior classification; malicious cyber activities; multiclass machine learning methods; multiclass malicious Web activities classification; qualitative analysis; quantitative analysis; supervised machine learning methods; support vector machines; three-tier Web system; Accuracy; Blogs; Electronic publishing; Information services; Internet; Learning systems; Support vector machines; Web 2.0 security; attacks; empirical study; multiclass classification; vulnerability scans;
Conference_Titel :
Software Reliability Engineering (ISSRE), 2012 IEEE 23rd International Symposium on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4673-4638-2
DOI :
10.1109/ISSRE.2012.30