Title :
Low cost log statistical filtering in the real world chained tokens and Bayesian filters
Author :
Havens, Russel W. ; Teng, Chia-Chi
Author_Institution :
Sch. of Technol., Brigham Young Univ., Provo, UT, USA
Abstract :
Log files are valuable resources for systems administrators to trouble shoot and prevent failures. Bayesian filters can be utilized as effective log entry filters, reducing the log noise that often makes today´s huge log files daunting. The effectiveness of three off-the-shelf, open source Bayesian spam filters, SpamAssassin, SpamBayes and Bogofilter, is tested for differentiating syslog entries from a corpus taken from production Linux servers at the School of Technology, Brigham Young University. Additionally, the effectiveness of word chaining, stacked word chaining and the normalization of numbers is discussed and quantified. The effectiveness of two of the filters is improved significantly by chaining words in two or three word chains, particularly for SpamAssassin, which saw a significant improvement in differentiation for the log corpus. Results of preliminary experiments have confirmed that these filters can effectively process typical systems logs and have potential applications in the future.
Keywords :
Bayes methods; file organisation; information filtering; statistical analysis; Bayesian filter; Bogofilter; SpamAssassin; SpamBayes; chained token; log entry filter; log files; low cost log statistical filtering; open source Bayesian spam filter; stacked word chaining; syslog entries; Decision support systems; Organizations; Unsolicited electronic mail; Bayesian content filter; log analysis; spam filter; word chaining;
Conference_Titel :
Applications of Digital Information and Web Technologies (ICADIWT), 2011 Fourth International Conference on the
Conference_Location :
Stevens Point, WI
Print_ISBN :
978-1-4244-9824-6
DOI :
10.1109/ICADIWT.2011.6041394