• DocumentCode
    172965
  • Title

    CURLA: Cloud-Based Spam URL Analyzer for Very Large Datasets

  • Author

    Zawoad, Shams ; Hasan, Ragib ; Haque, Md Mohaiminul ; Warner, Gary

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Alabama at Birmingham, Birmingham, AL, USA
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    729
  • Lastpage
    736
  • Abstract
    URL blacklisting is a widely used technique for blocking phishing websites. To prepare an effective blacklist, it is necessary to analyze possible threats and include the identified malicious sites in the blacklist. Spam emails are good source for acquiring suspected phishing websites. However, the number of URLs gathered from spam emails is quite large. Fetching and analyzing the content of this large number of websites are very expensive tasks given limited computing and storage resources. Moreover, a high percentage of URLs extracted from spam emails refer to the same website. Hence, preserving the contents of all the websites causes significant storage waste. To solve the problem of massive computing and storage resource requirements, we propose and develop CURLA - a Cloud-based spam URL Analyzer, built on top of Amazon Elastic Computer Cloud (EC2) and Amazon Simple Queue Service (SQS). CURLA allows processing large number of spam-based URLs in parallel, which reduces the cost of establishing equally capable local infrastructure. Our system builds a database of unique spam-based URLs and accumulates the content of these unique websites in a central repository, which can be later used for phishing or other counterfeit websites detection. We show the effectiveness of our proposed architecture using real-life spam-based URL data.
  • Keywords
    Web sites; cloud computing; computer crime; unsolicited e-mail; very large databases; Amazon elastic computer cloud; Amazon simple queue service; CURLA; EC2; SQS; URL blacklisting; cloud-based spam URL analyzer; counterfeit Web sites detection; fetching; malicious sites; massive computing; phishing Web site blocking; real-life spam-based URL data; spam emails; storage resource requirements; storage resources; storage waste; very large datasets; Cloud computing; Databases; Electronic mail; Parallel processing; Queueing analysis; Uniform resource locators; Cloud; Parallel Architecture; Phishing; Spam URL;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5062-1
  • Type

    conf

  • DOI
    10.1109/CLOUD.2014.102
  • Filename
    6973808