DocumentCode :
2427523
Title :
Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification
Author :
Geng, Guang-Gang ; Wang, Chun-Heng ; Li, Qiu-Dan ; Xu, Lei ; Jin, Xiao-Bo
Author_Institution :
Chinese Acad. of Sci., Beijing
Volume :
4
fYear :
2007
fDate :
24-27 Aug. 2007
Firstpage :
583
Lastpage :
587
Abstract :
Anti-spam has become one of the top challenges for the Web search. In this paper, we explore the Web spam detection as a binary classification problem. Based on the fact that reputable pages are more easy to be obtained than spam ones on the Web, an ensemble under-sampling classification strategy is adopted, which exploits the information involved in the large number of reputable Websites to full advantage. The strategy is based on the predicted spamicity of every sub-classifiers, in which both content-based and link-based features are taken into account. The experiments on standard WEBSPAM-UK2006 benchmark showed that the ensemble strategy can improve the web spam detection performance effectively.
Keywords :
Web sites; Web search; Web sites; Web spam detection; binary classification problem; content-based features; link-based features; undersampling classification strategy; Automation; Boosting; Explosives; Feature extraction; Intelligent systems; Laboratories; Search engines; Unsolicited electronic mail; Web pages; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
Conference_Location :
Haikou
Print_ISBN :
978-0-7695-2874-8
Type :
conf
DOI :
10.1109/FSKD.2007.207
Filename :
4406454
Link To Document :
بازگشت