• DocumentCode
    2526642
  • Title

    Detecting Worms Using Data Mining Techniques: Learning in the Presence of Class Noise

  • Author

    Ismail, Ismahani ; Marsono, Muhammad Nadzir ; Nor, Sulaiman Mohd

  • Author_Institution
    Fac. of Electr. Eng., Univ. Teknol. Malaysia, Johor Bahru, Malaysia
  • fYear
    2010
  • fDate
    15-18 Dec. 2010
  • Firstpage
    187
  • Lastpage
    194
  • Abstract
    Worms are self-contained programs that spread over the Internet. Worms cause problems such as lost of information, information theft and denial-of-service attacks. The first part of the paper evaluates the detection of worms based on content classification by using all machine learning techniques available in WEKA data mining tools. Four most accurate and quite fast classifiers are identified for further analysis-Naive Bayes, J48, SMO and Winnow. Results show that classification using machine learning techniques could classify worms to 99% accuracy. From the accuracy perspective, J48 performs better than other algorithms meanwhile Naive Bayes and Winnow show the best performances in terms of speed. The second part of the paper analyzes the accuracy these four classifiers under the presence of class noise in learning corpora. By injecting class noise ranging between 0% and 50% into positive and negative corpora, results from the simulation show gradual decrease in accuracy and increase in false positive and false negative for all analyzed techniques. The presence of the classes noise affects false positive more significantly compared to false negative. The results show that worm detection with classification algorithms could not tolerate the presence of classes noise in learning corpora.
  • Keywords
    Internet; data mining; invasive software; learning (artificial intelligence); noise; pattern classification; Internet vulnerability; Naive Bayes method; WEKA data mining tool; Winnow algorithm; class noise; content classification; machine learning technique; self contained program; worm detection; Accuracy; Data mining; Feature extraction; Grippers; Noise; Payloads; Training; class noise; data-mining techniques; worm detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal-Image Technology and Internet-Based Systems (SITIS), 2010 Sixth International Conference on
  • Conference_Location
    Kuala Lumpur
  • Print_ISBN
    978-1-4244-9527-6
  • Electronic_ISBN
    978-0-7695-4319-2
  • Type

    conf

  • DOI
    10.1109/SITIS.2010.41
  • Filename
    5714551