DocumentCode :
1866419
Title :
Less naive Bayes spai detection
Author :
Yang, Hongming ; Stassen, Maurice ; Tjalkens, Tjalling
Author_Institution :
Eindhoven Univ. of Technol., Eindhoven
fYear :
2007
fDate :
Jan. 29 2007-Feb. 2 2007
Firstpage :
388
Lastpage :
392
Abstract :
We consider a binary classification problem with a feature vector of high dimensionality. Spam mail filters are a popular example hereof. A naive Bayes filter assumes conditional independence of the feature vector components. We use the context tree weighting method as an application of the minimum description length principle to allow for dependencies between the feature vector components. It turns out that, due to the limited amount of training data, we must assume conditional independence between groups of vector components. We consider several ad-hoc algorithms to find good groupings and good conditional models.
Keywords :
Bayes methods; information filtering; information filters; trees (mathematics); unsolicited e-mail; Bayes filter; Bayes spam detection; binary classification; context tree weighting; feature vector components; spam mail filters; training data; Electronic mail; Filters; Postal services; Training data; Unsolicited electronic mail;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory and Applications Workshop, 2007
Conference_Location :
La Jolla, CA
Print_ISBN :
978-0-615-15314-8
Type :
conf
DOI :
10.1109/ITA.2007.4357608
Filename :
4357608
Link To Document :
بازگشت