• DocumentCode
    1734706
  • Title

    Adversarial Spam Detection Using the Randomized Hough Transform-Support Vector Machine

  • Author

    Debarr, Dave ; Hao Sun ; Wechsler, Harry

  • Author_Institution
    Comput. Sci. Dept., George Mason Univ., Fairfax, VA, USA
  • Volume
    1
  • fYear
    2013
  • Firstpage
    299
  • Lastpage
    304
  • Abstract
    In public e-mail systems, it is possible to solicit annotation help from users to train spam detection models. For example, we can occasionally ask a selected user to annotate whether a randomly selected message destined for their inbox is spam or not spam. Unfortunately, it is also possible that the user being solicited is an internal threat and has malicious intent. Similar to an adversary, such a user may want to introduce noise: to confuse the spam classifier into believing a spam message is not spam (to ensure delivery of similar messages), or to confuse the spam classifier into believing a non-spam message is spam (to prevent delivery of similar messages). Inspired by the Randomized Hough Transform (RHT), a set of Support Vector Machines (SVMs) is trained from randomly chosen data subsets to vote to identify training examples that have been mislabeled. The labels for messages which on the average appear on the wrong side of the decision boundary are flipped and a final SVM model is trained using the modified labels. Two data sets are used for evaluating the proposed RHT-SVM method: the TREC 2007 Spam Track data and the CEAS 2008 Spam data. To preserve the time ordered nature of the data stream, for each of the data sets, the first 10% of the messages are used for training, and the remaining 90% of the messages are used for evaluation. Separate adversarial experiments are conducted for flipping spam labels and non-spam labels. For 10 iterations, labels are flipped for a randomly selected subset of 5% of the training data and the final RHT-SVM is evaluated on the test set. Performance of the RHT-SVM is compared to the performance of the state of the art Reject On Negative Impact (RONI) algorithm. RHT-SVM shows an average 9.3% increase in the F measure compared to RONI (99.0% versus 90.6%), as well as significant improvements in other evaluation metrics. The flip sensitivity for RHT-SVM is 95.9% and the flip specificity is 99.0%. It also takes over 90% less time- to complete the RHT-SVM experiments compared to the RONI experiments (20 minutes per experiment instead of 360 minutes).
  • Keywords
    Hough transforms; pattern classification; security of data; support vector machines; unsolicited e-mail; CEAS 2008 spam data; RHT-SVM; RONI; TREC 2007 spam track data; adversarial spam detection; internal threat; malicious intent; nonspam labels; nonspam message; public e-mail systems; randomized Hough transform-support vector machine; randomly selected message; reject on negative impact algorithm; spam classifier; spam labels; Kernel; Noise; Support vector machines; Training; Training data; Transforms; Unsolicited electronic mail; Adversarial Label Noise; Adversarial Learning; Spam Detection; Support Vector Machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2013 12th International Conference on
  • Conference_Location
    Miami, FL
  • Type

    conf

  • DOI
    10.1109/ICMLA.2013.61
  • Filename
    6784631