DocumentCode :
172965
Title :
CURLA: Cloud-Based Spam URL Analyzer for Very Large Datasets
Author :
Zawoad, Shams ; Hasan, Ragib ; Haque, Md Mohaiminul ; Warner, Gary
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Alabama at Birmingham, Birmingham, AL, USA
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
729
Lastpage :
736
Abstract :
URL blacklisting is a widely used technique for blocking phishing websites. To prepare an effective blacklist, it is necessary to analyze possible threats and include the identified malicious sites in the blacklist. Spam emails are good source for acquiring suspected phishing websites. However, the number of URLs gathered from spam emails is quite large. Fetching and analyzing the content of this large number of websites are very expensive tasks given limited computing and storage resources. Moreover, a high percentage of URLs extracted from spam emails refer to the same website. Hence, preserving the contents of all the websites causes significant storage waste. To solve the problem of massive computing and storage resource requirements, we propose and develop CURLA - a Cloud-based spam URL Analyzer, built on top of Amazon Elastic Computer Cloud (EC2) and Amazon Simple Queue Service (SQS). CURLA allows processing large number of spam-based URLs in parallel, which reduces the cost of establishing equally capable local infrastructure. Our system builds a database of unique spam-based URLs and accumulates the content of these unique websites in a central repository, which can be later used for phishing or other counterfeit websites detection. We show the effectiveness of our proposed architecture using real-life spam-based URL data.
Keywords :
Web sites; cloud computing; computer crime; unsolicited e-mail; very large databases; Amazon elastic computer cloud; Amazon simple queue service; CURLA; EC2; SQS; URL blacklisting; cloud-based spam URL analyzer; counterfeit Web sites detection; fetching; malicious sites; massive computing; phishing Web site blocking; real-life spam-based URL data; spam emails; storage resource requirements; storage resources; storage waste; very large datasets; Cloud computing; Databases; Electronic mail; Parallel processing; Queueing analysis; Uniform resource locators; Cloud; Parallel Architecture; Phishing; Spam URL;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5062-1
Type :
conf
DOI :
10.1109/CLOUD.2014.102
Filename :
6973808
Link To Document :
بازگشت