• DocumentCode
    623797
  • Title

    Combining supervised and unsupervised learning for zero-day malware detection

  • Author

    Comar, Prakash Mandaym ; Lei Liu ; Saha, Simanto ; Pang-Ning Tan ; Nucci, Antonio

  • Author_Institution
    Dept. of Comput. Sci., Michigan State Univ., East Lansing, MI, USA
  • fYear
    2013
  • fDate
    14-19 April 2013
  • Firstpage
    2022
  • Lastpage
    2030
  • Abstract
    Malware is one of the most damaging security threats facing the Internet today. Despite the burgeoning literature, accurate detection of malware remains an elusive and challenging endeavor due to the increasing usage of payload encryption and sophisticated obfuscation methods. Also, the large variety of malware classes coupled with their rapid proliferation and polymorphic capabilities and imperfections of real-world data (noise, missing values, etc) continue to hinder the use of more sophisticated detection algorithms. This paper presents a novel machine learning based framework to detect known and newly emerging malware at a high precision using layer 3 and layer 4 network traffic features. The framework leverages the accuracy of supervised classification in detecting known classes with the adaptability of unsupervised learning in detecting new classes. It also introduces a tree-based feature transformation to overcome issues due to imperfections of the data and to construct more informative features for the malware detection task. We demonstrate the effectiveness of the framework using real network data from a large Internet service provider.
  • Keywords
    Internet; invasive software; learning (artificial intelligence); tree data structures; Internet service provider; Internet today; layer 3 network traffic features; layer 4 network traffic features; obfuscation methods; payload encryption; polymorphic capabilities; security threats; supervised learning; tree-based feature transformation; unsupervised learning; zero-day malware detection; Feature extraction; Kernel; Malware; Payloads; Support vector machines; Training; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2013 Proceedings IEEE
  • Conference_Location
    Turin
  • ISSN
    0743-166X
  • Print_ISBN
    978-1-4673-5944-3
  • Type

    conf

  • DOI
    10.1109/INFCOM.2013.6567003
  • Filename
    6567003