Title :
A Cluster-based Approach to Filtering Spam under Skewed Class Distributions
Author :
Wen-Feng Hsiao ; Chang, Te-Ming ; Hu, Guo-Hsin
Author_Institution :
Dept. of Inf. Manage., Nat. Pingtung Inst. of Commerce
Abstract :
The purpose of this research is to propose an appropriate classification approach to improving the effectiveness of spam filtering on the issue of skewed class distributions. A clustering-based classifier is proposed to first cluster documents into several groups, and then an equal number of keywords are extracted from each group to alleviate the problem caused by skewed class distributions. Experiments are conducted to validate the effectiveness of the proposed classifier. The results show that our proposed classifier can effectively deal with the issue of skewed class distributions in the task of spam filtering
Keywords :
data mining; pattern classification; pattern clustering; text analysis; unsolicited e-mail; classification; document clustering; keyword extraction; skewed class distribution; spam filtering; text mining; Boosting; Decision trees; Frequency; Information filtering; Information filters; Information management; Matched filters; Support vector machines; Text mining; Unsolicited electronic mail;
Conference_Titel :
System Sciences, 2007. HICSS 2007. 40th Annual Hawaii International Conference on
Conference_Location :
Waikoloa, HI
Electronic_ISBN :
1530-1605
DOI :
10.1109/HICSS.2007.7