• DocumentCode
    2027011
  • Title

    A lexical approach for classifying malicious URLs

  • Author

    Darling, Michael ; Heileman, Greg ; Gressel, Gilad ; Ashok, Aravind ; Poornachandran, Prabaharan

  • Author_Institution
    Electr. & Comput. Eng., Univ. of New Mexico, Albuquerque, NM, USA
  • fYear
    2015
  • fDate
    20-24 July 2015
  • Firstpage
    195
  • Lastpage
    202
  • Abstract
    Given the continuous growth of malicious activities on the internet, there is a need for intelligent systems to identify malicious web pages. It has been shown that URL analysis is an effective tool for detecting phishing, malware, and other attacks. Previous studies have performed URL classification using a combination of lexical features, network traffic, hosting information, and other strategies. These approaches require time-intensive lookups which introduce significant delay in real-time systems. In this paper, we describe a lightweight approach for classifying malicious web pages using URL lexical analysis alone. Our goal is to explore the upper-bound of the classification accuracy of a purely lexical approach. We also aim to develop a scalable approach which could be used in a real-time system. We develop a classification system based on lexical analysis of URLs. It correctly classifies URLs of malicious web pages with 99.1% accuracy, a 0.4% false positive rate, an F1-Score of 98.7, and 0.62 milliseconds on average. Our method also outperforms similar approaches when classifying out-of-sample data.
  • Keywords
    Internet; Web sites; classification; computer crime; computer network security; invasive software; unsolicited e-mail; Internet; URL analysis; URL lexical analysis; classification system; hosting information; intelligent systems; lexical approach; lexical features; malicious URL classification; malicious Web pages; malware; network traffic; phishing; Accuracy; Data models; Feature extraction; Malware; Training data; Uniform resource locators; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing & Simulation (HPCS), 2015 International Conference on
  • Conference_Location
    Amsterdam
  • Print_ISBN
    978-1-4673-7812-3
  • Type

    conf

  • DOI
    10.1109/HPCSim.2015.7237040
  • Filename
    7237040