Title :
Filtering spam e-mail with Generalized Additive Neural Networks
Author :
Du Toit, Tiny ; Kruger, Hennie
Author_Institution :
Sch. of Comput., Stat. & Math. Sci., North-West Univ., Potchefstroom, South Africa
Abstract :
Some of the major security risks associated with spam e-mail are the spreading of computer viruses and the facilitation of phishing exercises. Spam is therefore regarded as one of the prominent security threats in modern organizations. Security controls, such as spam filtering techniques, have become increasingly important to protect information and information assets. In this paper the performance of a Generalized Additive Neural Network on a publicly available e-mail corpus is investigated in the context of statistical spam filtering. The neural network is compared to a Naive Bayesian classifier and a Memory-based technique. Generalized Additive Neural Networks have a number of advantages compared to neural networks in general. An automated construction algorithm performs feature and model selection simultaneously and produces results which can be interpreted by a graphical method. This algorithm is powerful, effective and performs highly accurate compared to other non-linear model selection methods. The paper also considers the impact of different feature set sizes using cost-sensitive measures. These criteria are sensitive to the cost difference between two common types of errors made by filtering systems. Experiments show better performance compared to the Naive Bayes and Memory-based classifiers where legitimate e-mails are assigned the same cost as spams. This result suggests Generalized Additive Neural Networks may be utilized to flag spam e-mails in order to prioritize the reading of messages.
Keywords :
Bayes methods; computer viruses; information filtering; neural nets; unsolicited e-mail; Naive Bayesian classifier; automated construction algorithm; computer viruses; e-mail corpus; filtering spam e-mail; generalized additive neural networks; graphical method; information assets; information protection; memory based technique; phishing exercises; security controls; security risks; security threats; spam filtering techniques; statistical spam filtering; Accuracy; Additives; Bayesian methods; Biological neural networks; Unsolicited electronic mail; Generalized Additive Neural Network; Memorybased classifier; Naive Bayesian classifier; Neural Network; Security risk; Spam; Spam filtering;
Conference_Titel :
Information Security for South Africa (ISSA), 2012
Conference_Location :
Johannesburg, Gauteng
Print_ISBN :
978-1-4673-2160-0
DOI :
10.1109/ISSA.2012.6320446