• DocumentCode
    3423274
  • Title

    Detecting scareware by mining variable length instruction sequences

  • Author

    Shahzad, R.K. ; Lavesson, Nils

  • Author_Institution
    Sch. of Comput., Blekinge Inst. of Technol., Karlskrona, Sweden
  • fYear
    2011
  • fDate
    15-17 Aug. 2011
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Scareware is a recent type of malicious software that may pose financial and privacy-related threats to novice users. Traditional countermeasures, such as anti-virus software, require regular updates and often lack the capability of detecting novel (unseen) instances. This paper presents a scareware detection method that is based on the application of machine learning algorithms to learn patterns in extracted variable length opcode sequences derived from instruction sequences of binary files. The patterns are then used to classify software as legitimate or scareware but they may also reveal interpretable behavior that is unique to either type of software. We have obtained a large number of real world scareware applications and designed a data set with 550 scareware instances and 250 benign instances. The experimental results show that several common data mining algorithms are able to generate accurate models from the data set. The Random Forest algorithm is shown to outperform the other algorithms in the experiment. Essentially, our study shows that, even though the differences between scareware and legitimate software are subtler than between, say, viruses and legitimate software, the same type of machine learning approach can be used in both of these dissimilar cases.
  • Keywords
    data mining; invasive software; learning (artificial intelligence); pattern classification; sequences; binary files; data mining algorithms; machine learning algorithms; malicious software; privacy-related threats; random forest algorithm; scareware detection method; variable length instruction sequences; Classification algorithms; Data mining; Feature extraction; Malware; Software; Software algorithms; Vocabulary; Classification; Instruction Sequence; Scareware;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Security South Africa (ISSA), 2011
  • Conference_Location
    Johannesburg
  • Print_ISBN
    978-1-4577-1481-8
  • Type

    conf

  • DOI
    10.1109/ISSA.2011.6027523
  • Filename
    6027523