Title :
An Ensemble Learning Framework for Online Web Spam Detection
Author :
Cailing Dong ; Bin Zhou
Author_Institution :
Dept. of Inf. Syst., Univ. of Maryland, Baltimore County, Baltimore, MD, USA
Abstract :
Most of the existing studies about web spam detection explicitly or implicitly assume that the detection process is performed offline on the search engine side. However, we argue that online web spam detection is even useful in some specific scenarios. We propose to implement a web browser plug-in to support online web spam detection. Three different sets of spam labeling data are collected and adopted for learning a reliable web spam classifier. An empirical study is conducted on the benchmark web spam data collection. The statistical analysis of the data set verifies the necessity of online web spam detection. The performance of the proposed ensemble learning framework for online web spam detection is also examined and it meets the requirement of online webs Pam detection.
Keywords :
Internet; learning (artificial intelligence); online front-ends; pattern classification; search engines; statistical analysis; unsolicited e-mail; Web browser plug-in; Web spam classifier; Web spam data collection; ensemble learning framework; online Web spam detection; search engine side; spam labeling data; statistical analysis; Browsers; Detectors; Labeling; Search engines; Servers; Unsolicited electronic mail; Web pages; ensemble learning; online web spam detection; personalization;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2013 12th International Conference on
Conference_Location :
Miami, FL
DOI :
10.1109/ICMLA.2013.15