Title :
Flow-level spam modelling using separate data sources
Author :
Luckner, Marcin ; Filasiak, Robert
Author_Institution :
Fac. of Math. & Inf. Sci., Warsaw Univ. of Technol., Warsaw, Poland
Abstract :
Spam detection based on flow-level statistics is a new approach in anti-spam techniques. The approach reduces number of collected data but still can obtain relative good results in a spam detection task. The main problems in the approach are selection of flow-level features that describe spam and detection of discrimination rules. In this work, flow-level model of spam is presented. The model describes spam subclasses and brings information about major features of a spam detection task. The model is the base for decision trees that detect spam. The analysis of detectors, which was learned from data collected from different mail servers, results in the universal spam description consists of the most significant features. Flows described by selected features and collected on Broadband Remote Access Server were analysed by an ensemble of created classifiers. The ensemble detected major sources of spam among senders IP addresses.
Keywords :
decision trees; statistical analysis; unsolicited e-mail; antispam technique; broadband remote access server; decision trees; flow-level feature; flow-level spam modelling; flow-level statistics; mail server; spam detection task; spam subclasses; Accuracy; Data models; Decision trees; IP networks; Servers; Unsolicited electronic mail; Anomaly detection; Flow analysis; Spam detection;
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on
Conference_Location :
Krako??w