Title :
Lightweight Clustering Methods for Webspam Demotion
Author :
Largillier, Thomas ; Peyronnet, Sylvain
Author_Institution :
LRI, Univ Paris-Sud, Orsay, France
fDate :
Aug. 31 2010-Sept. 3 2010
Abstract :
To make sure they can quickly respond to a specific query, the main search engines have several mechanisms. One of them consists in ranking web pages according to their importance, regardless of the semantic of the web page. Indeed, relevance to a query is not enough to provide a high quality result, and popularity is used to arbitrate between equally relevant web pages. Webspam widely denotes any web page created with the only purpose of fooling ranking algorithms such as the PageRank. The aim of Webspam is to promote a target page by increasing its rank. It is an important issue for Web search engines to spot and discard Webspam to provide their users with a non biased list of results. Webspam techniques have to evolve constantly to remain efficient but most of the time they consist in creating a specific linking architecture around the target page to increase its rank. In this paper we propose to study the effects of graph clustering on the well known ranking algorithm of Google (the PageRank) in presence of Webspam. Since the web graph is way to big to apply classic clustering techniques, we present three lightweight techniques to realise a clustering of the web graph. Experimental results show the interest of the approach, which is moreover confirmed by statistical evidence.
Keywords :
Internet; graphs; pattern clustering; search engines; unsolicited e-mail; Google; PageRank; Web graph; Web page semantic; Web spam technique; fooling ranking algorithm; graph clustering; lightweight clustering method; search engine; specific linking architecture; target page; Clustering; Demotion; Webspam;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4244-8482-9
Electronic_ISBN :
978-0-7695-4191-4
DOI :
10.1109/WI-IAT.2010.243