DocumentCode :
2008102
Title :
Highly Scalable SVM Modeling with Random Granulation for Spam Sender Detection
Author :
Tang, Yuchun ; He, Yuanchen ; Krasser, Sven
Author_Institution :
Secure Comput. Corp., Alpharetta, GA
fYear :
2008
fDate :
11-13 Dec. 2008
Firstpage :
659
Lastpage :
664
Abstract :
Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the email sender. A fast and accurate classifier is desirable in such an application. In this research, a highly scalable SVM modeling method, named Granular SVM with Random granulation (GSVM-RAND), is designed. GSVM-RAND applies bootstrapping to extract a number of subsets of samples from the original training dataset. Each training subset is then projected into a feature subspace randomly selected from the original feature space. Here we call a granule such a subset of samples in such a feature subspace. A local SVM is then modeled in each granule. For a new sample, it is firstly projected into each granule in which the local SVM is fired to make a prediction. After that, all SVM predictions are aggregated by Bayesian Sum Rule for a final decision. GSVM-RAND is easy to be parallelized and hence efficient and highly scalable. GSVM-RAND is also effective by integrating a large number of weak, low-correlated local SVMs.
Keywords :
Bayes methods; data mining; feature extraction; learning (artificial intelligence); pattern classification; random processes; sampling methods; support vector machines; text analysis; unsolicited e-mail; Bayesian sum rule; IP address; bootstrapping method; complex large-scale text mining; email subject data; feature subspace random selection; high scalable SVM modeling; random granulation; spam sender detection; subset extraction; support vector machine; Bayesian methods; Floods; Helium; Large-scale systems; Machine learning; Machine learning algorithms; Support vector machine classification; Support vector machines; Text mining; Unsolicited electronic mail; classification ensembling; data mining; email spam detection; granular computing; information security; machine learning; svm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-0-7695-3495-4
Type :
conf
DOI :
10.1109/ICMLA.2008.51
Filename :
4725045
Link To Document :
بازگشت