Title :
Phishing URL Detection Using URL Ranking
Author :
Feroz, Mohammed Nazim ; Mengel, Susan
Author_Institution :
Comput. Sci., Texas Tech Univ., Lubbock, TX, USA
Abstract :
The openness of the Web exposes opportunities for criminals to upload malicious content. In fact, despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing host URLs. This paper describes an approach that classifies URLs automatically based on their lexical and host-based features. Clustering is performed on the entire dataset and a cluster ID (or label) is derived for each URL, which in turn is used as a predictive feature by the classification system. Online URL reputation services are used in order to categorize URLs and the categories returned are used as a supplemental source of information that would enable the system to rank URLs. The classifier achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. URL clustering, URL classification, and URL categorization mechanisms work in conjunction to give URLs a rank.
Keywords :
Web services; Web sites; computer crime; information filtering; pattern classification; pattern clustering; unsolicited e-mail; URL categorization mechanism; URL classification; URL ranking; Web services; cluster ID; clustering; email based spam filtering technique; host-based feature; lexical feature; malicious content; online URL reputation service; phishing URL detection; phishing host URL; predictive feature; Accuracy; Classification algorithms; Clustering algorithms; Feature extraction; Security; Servers; Uniform resource locators; Classification; Clustering; Feature Vector; URL Ranking; Web Categorization;
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
DOI :
10.1109/BigDataCongress.2015.97