• DocumentCode
    265334
  • Title

    Cloud antivirus cost model using machine learning

  • Author

    Hamzah, Ali Abdullah ; Khattab, Sherif M. ; El-Gamal, Salwa S.

  • Author_Institution
    Fac. of Comput. & Inf., Cairo Univ., Cairo, Egypt
  • fYear
    2014
  • fDate
    15-17 Dec. 2014
  • Abstract
    An important cloud computing is a new generation of computing and is based on virtualization technology. More and more applications are being deployed in cloud environments. Malware detection or antivirus software has been recently provided as a service in the cloud. A cloud antivirus provider hosts a number of virtual machines each running the same or different antivirus engines on potentially different sets of workloads (files). From the provider´s perspective, the problem of optimally allocating physical resources to these virtual machines is crucial to the efficiency of the infrastructure. We propose a search-based optimization approach for solving the resource allocation problem in cloud-based antivirus deployments. An elaborate cost model of the file scanning process in antivirus programs is instrumental to the proposed approach. The general architecture is presented and discussed, and a preliminary experimental investigation into the antivirus cost model is described. The cost model depends on many factors, such as total file size, size of code segment, and count and type of embedded files within the executable. However, not a single parameter of these can be reliably used alone to predict file scanning time. Thus, a machine-learning approach that combines all these parameters as features is used to build a classifier for antivirus file scanning time. The best results we obtained were using the Decision Tree classifier. The highest F-measure value was 0.91, the highest F-measure value using logitboost was 0.87, the highest F-measure value using support vector machine was 0.85 and the highest F-measure value using naïve Bayes was 0.82. We evaluated the accuracy of the classification model versus linear regression model using the Root Mean Square (RMS) measure. We found that the classification model is more accurate than linear regression model, whereas the values average of RMS were 0.988 second and 2.44 second for classification model and linear re- ression model, respectively.
  • Keywords
    Bayes methods; cloud computing; computer viruses; embedded systems; learning (artificial intelligence); mean square error methods; pattern classification; regression analysis; resource allocation; support vector machines; virtual machines; virtualisation; F-measure value; RMS measure; antivirus engine; antivirus file scanning time; antivirus program; antivirus software; classification model; cloud antivirus cost model; cloud antivirus provider; cloud computing; cloud environment; cloud-based antivirus deployment; code segment; decision tree classifier; embedded file; file scanning process; linear regression model; logitboost; machine learning; malware detection; naïve Bayes; physical resources allocation; resource allocation problem; root mean square measure; search-based optimization approach; support vector machine; virtual machine; virtualization technology; Decision support systems; Antivirus; Cost Model; Machine Learning; Resource Allocation; Virtualization; cloud Antivirus;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Informatics and Systems (INFOS), 2014 9th International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-977-403-689-7
  • Type

    conf

  • DOI
    10.1109/INFOS.2014.7036708
  • Filename
    7036708