• DocumentCode
    2596906
  • Title

    Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data

  • Author

    Tang, Yuchun ; Krasser, Sven ; Judge, Paul ; Zhang, Yan-Qing

  • Author_Institution
    Secure Comput. Corp., Alpharetta, GA
  • fYear
    2006
  • fDate
    17-20 Nov. 2006
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Unsolicited commercial or bulk emails or emails containing virus currently pose a great threat to the utility of email communications. A recent solution for filtering is reputation systems that can assign a value of trust to each IP address sending email messages. By analyzing the query patterns of each participating node, reputation systems can calculate a reputation score for each queried IP address and serve as a platform for global collaborative spam filtering for all participating nodes. In this research, we explore a behavioral classification approach based on spectral sender characteristics retrieved from such global messaging patterns. Due to the large amount of bad senders, this classification task has to cope with highly imbalanced data. In order to solve this challenging problem, a novel granular support vector machine - boundary alignment algorithm (GSVM-BA) is designed. GSVM-BA looks for the optima] decision boundary by repetitively removing positive support vectors from the training dataset and rebuilding another SVM. Compared to the original SVM algorithm with cost-sensitive learning, GSVM-BA demonstrates superior performance on spam IP detection, in terms of both effectiveness and efficiency
  • Keywords
    pattern classification; support vector machines; unsolicited e-mail; behavioral classification; granular support vector machine; highly imbalanced mail server behavior data; spam filtering; spam sender detection; unsolicited email; Algorithm design and analysis; Computer science; Electronic mail; Filtering; Home appliances; International collaboration; Pattern analysis; Support vector machine classification; Support vector machines; Unsolicited electronic mail; class imbalance; data mining; granular support vector machine; spam filtering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Collaborative Computing: Networking, Applications and Worksharing, 2006. CollaborateCom 2006. International Conference on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    1-4244-0429-0
  • Electronic_ISBN
    1-4244-0429-0
  • Type

    conf

  • DOI
    10.1109/COLCOM.2006.361856
  • Filename
    4207528