DocumentCode :
2760788
Title :
Web spam detection based on discriminative content and link features
Author :
Mahmoudi, Maryam ; Yari, Alireza ; Khadivi, Shahram
Author_Institution :
IT Res. Fac., Iran Telecom Res. Center, Tehran, Iran
fYear :
2010
fDate :
4-6 Dec. 2010
Firstpage :
542
Lastpage :
546
Abstract :
The problem of spam detection is a crucial task in the web information retrieval systems. The dynamic nature of information resources as well as the continuous changes in the information demands of the users makes the task of web spam detection a challenging topic. So far many different methods from researchers with different backgrounds have been proposed to tackle with spam web pages problem. In this research, we study feature space of web spam detection to recognize most effective and discriminative features. Thereafter, we design a spam detection system that employs a minimum set of features and at the same time its performance is the same or very close to a system with the complete feature set. The experimental results show that we can reduce the number of features in a clever way while the accuracy of the system is intact or even improved.
Keywords :
Internet; information retrieval systems; learning (artificial intelligence); security of data; unsolicited e-mail; Web information retrieval systems; Web spam detection; discriminative content; link features; Accuracy; Artificial neural networks; Classification algorithms; Feature extraction; Search engines; Support vector machines; Web pages; Search engine; classification; data mining; feature selection; web spam;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Telecommunications (IST), 2010 5th International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-4244-8183-5
Type :
conf
DOI :
10.1109/ISTEL.2010.5734084
Filename :
5734084
Link To Document :
بازگشت