Title :
Hybrid spamicity score approach to web spam detection
Author :
Algur, Siddu P. ; Pendari, N.T.
Author_Institution :
Dept. Of Inf. Sci. & Eng, BVBCET, Hubli, India
Abstract :
Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Fundamentally, Web spam is designed to pollute search engines and corrupt the user experience by driving traffic to particular spammed Web pages, regardless of the merits of those pages. Recently, there is dramatic increase in amount of web spam, leading to a degradation of search results. Most of the existing web spam detection methods are supervised that require a large set of training web pages. The proposed system studies the problem of unsupervised web spam detection. It introduces the notion of spamicity to measure how likely a page is spam. Spamicity is a more flexible measure than the traditional supervised classification methods. In the proposed system link and content spam techniques are used to determine the spamicity score of web page. A threshold is set by empirical analysis which classifies the web page into spam or non spam.
Keywords :
Web sites; pattern classification; search engines; security of data; Web page classification; Web page training; hybrid spamicity score approach; search engines; spammed Web pages; unsupervised Web spam detection; user experience; Detectors; Informatics; Pattern recognition; Redundancy; Search engines; Unsolicited electronic mail; Web pages; spamdexing; spamicity score; web spam detection;
Conference_Titel :
Pattern Recognition, Informatics and Medical Engineering (PRIME), 2012 International Conference on
Conference_Location :
Salem, Tamilnadu
Print_ISBN :
978-1-4673-1037-6
DOI :
10.1109/ICPRIME.2012.6208284