• DocumentCode
    618135
  • Title

    Analyzing string format-based classifiers for botnet detection: GP and SVM

  • Author

    Haddadi, Fariba ; Zincir-Heywood, A. Nur

  • Author_Institution
    Comput. Sci., Dalhousie Univ., Halifax, NS, Canada
  • fYear
    2013
  • fDate
    20-23 June 2013
  • Firstpage
    2626
  • Lastpage
    2633
  • Abstract
    The domain name system (DNS) is an essential component of Internet. As it is expected to be used by all legitimate users and applications, generally there are less inspections, restrictions and filters on it. Botnets rely on this open component to accomplish their malicious operation. Therefore, to defeat the single point of failure and evade static blacklists and firewalls, they employ DNS-based methods to frequently generate new automatic domain names. Stateful-SBB, which is a form of genetic programming (GP), was previously designed and developed by the authors to detect these automatically generated domain names based on minimum a priori information which was shown efficient. In this paper, we compare Stateful-SBB against the String Subsequence Kernel (SSK) and SSK with Lambda Pruning (SSK-LP), which are based on support vector machines (SVM) and also use string format inputs. Analyzing the domain names that each of the classifiers chooses as a part of their solutions in the classification process, we notice that 50% to 63% of the Stateful-SBBs´ frequently selected points on the Pareto-front are also used by SSK and SSK-LP, respectively. By analyzing these common domain names, we identify some of the characteristics of the botnet domain names. Moreover, we introduce a pruned version of the Stateful-SBB that resulted in reducing the solution complexity by 83% with the same high accuracy.
  • Keywords
    Internet; data analysis; genetic algorithms; pattern classification; security of data; support vector machines; DNS-based method; GP; Internet; SSK with lambda pruning; SVM; Stateful-SBB; botnet detection; classification process; classifier analysis; domain name system; genetic programming; string format input; string format-based classifier; string subsequence kernel; support vector machines; Computers; Feature extraction; Internet; Kernel; Servers; Support vector machines; Training; botnet domain name detection; evolutionary computation; genetic programming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2013 IEEE Congress on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4799-0453-2
  • Electronic_ISBN
    978-1-4799-0452-5
  • Type

    conf

  • DOI
    10.1109/CEC.2013.6557886
  • Filename
    6557886