Title of article
Detection of cloaked web spam by using tag-based methods
Author/Authors
Lin، نويسنده , , Jun-Lin، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2009
Pages
7
From page
7493
To page
7499
Abstract
Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results. Cloaking is a widely adopted technique of concealing web spam by replying different content to search engines’ crawlers from that displayed in a web browser. Previous work on cloaking detection is mainly based on the differences in terms and/or links between multiple copies of a URL retrieved from web browser and search engine crawler perspectives. This work presents three methods of using difference in tags to determine whether a URL is cloaked. Since the tags of a web page generally do not change as frequently and significantly as the terms and links of the web page, tag-based cloaking detection methods can work more effectively than the term- or link-based methods. The proposed methods are tested with a dataset of URLs covering short-, medium- and long-term users’ interest. Experimental results indicate that the tag-based methods outperform term- or link-based methods in both precision and recall. Moreover, a Weka J4.8 classifier using a combination of term and tag features yields an accuracy rate of 90.48%.
Keywords
Web spam , Cloaking detection , Classification
Journal title
Expert Systems with Applications
Serial Year
2009
Journal title
Expert Systems with Applications
Record number
2346459
Link To Document