Title :
Information Theory Based Feature Valuing for Logistic Regression for Spam Filtering
Author :
Qi, Haoliang ; He, Xiaoning ; Han, Yong ; Yang, Muyun ; Li, Sheng
Author_Institution :
Comput. Sci. & Technol. Dept., Heilongjiang Inst. of Technol., Harbin, China
Abstract :
Discriminative learning models such as Logistic Regression (LR) has shown good performance in spam filtering tasks. While most previous researches on LR have used binary features, this discards much useful information. To overcome this problem, information theory based feature valuing method for LR instead of traditional binary features is presented. The effectiveness of our approach has been evaluated on TREC, CEAS, and SEWM test sets. Results show that the proposed method outperforms the traditional binary features in the most test sets.
Keywords :
information theory; logistics; regression analysis; unsolicited e-mail; binary feature; discriminative learning; feature valuing; information theory; logistic regression; spam filtering; Feature extraction; Filtering theory; Logistics; Support vector machines; Unsolicited electronic mail; feature valuing; informatin theory; logistic regression; spam fitering;
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
DOI :
10.1109/IALP.2010.65