DocumentCode
265684
Title
GuidedTracker: Track the victims with access logs to finding malicious web pages
Author
Hongzhou Sha ; Qingyun Liu ; Zhou Zhou ; Chao Zheng
Author_Institution
Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear
2014
fDate
8-12 Dec. 2014
Firstpage
564
Lastpage
569
Abstract
Malicious web pages have become a malignant tumour for the Internet, which spread malicious code, steal people\´s private information, and deliver spamming advertisements. And how to distinguish them from the huge number of normal web pages effectively remains a huge challenge in the era of big data. To detect malicious pages, one needs to first collect candidate web pages that are live on the web; then filter massive legitimate pages using fast filters and finally examine the remaining pages using precisely but slow analyzer. However, there are new challenges recently for these conventional techniques, including large scale, imbalance data and the usage of cloaking techniques. To cope with these challenges, the malicious URL detection system should perform more efficiently. In this paper, we propose a system, named GuidedTracker, to search for suspicious malicious pages. GuidedTracker starts from the seed set which includes known malicious pages. Then, it automatically figures out those victims based on the seed set and the visit relation database. Finally, the access records of these victims are used to identify other malicious pages. In this way, GuidedTracker increase the percentages of malicious URLs in the input URL stream submitted to the precisely analyzer. To our best knowledge, GuidedTracker is the first to introduce visit relations to tackle the malicious URL detection problem. The introduction of visit relations limits the scope of URL inspection and enables this approach to have the ability of self-learning. Experimental results show that the overall "toxicity" can be improved by 6.97%-50.38% compared with full inspection of access logs.
Keywords
Web sites; computer crime; invasive software; relational databases; unsupervised learning; GuidedTracker system; input URL stream; malicious URL; malicious web pages; self-learning; suspicious malicious pages; visit relation database; Detectors; Information systems; Inspection; Security; Uniform resource locators; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Global Communications Conference (GLOBECOM), 2014 IEEE
Conference_Location
Austin, TX
Type
conf
DOI
10.1109/GLOCOM.2014.7036867
Filename
7036867
Link To Document