DocumentCode :
1955370
Title :
Information Theory Based Feature Valuing for Logistic Regression for Spam Filtering
Author :
Qi, Haoliang ; He, Xiaoning ; Han, Yong ; Yang, Muyun ; Li, Sheng
Author_Institution :
Comput. Sci. & Technol. Dept., Heilongjiang Inst. of Technol., Harbin, China
fYear :
2010
fDate :
28-30 Dec. 2010
Firstpage :
166
Lastpage :
169
Abstract :
Discriminative learning models such as Logistic Regression (LR) has shown good performance in spam filtering tasks. While most previous researches on LR have used binary features, this discards much useful information. To overcome this problem, information theory based feature valuing method for LR instead of traditional binary features is presented. The effectiveness of our approach has been evaluated on TREC, CEAS, and SEWM test sets. Results show that the proposed method outperforms the traditional binary features in the most test sets.
Keywords :
information theory; logistics; regression analysis; unsolicited e-mail; binary feature; discriminative learning; feature valuing; information theory; logistic regression; spam filtering; Feature extraction; Filtering theory; Logistics; Support vector machines; Unsolicited electronic mail; feature valuing; informatin theory; logistic regression; spam fitering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
Type :
conf
DOI :
10.1109/IALP.2010.65
Filename :
5681605
Link To Document :
بازگشت