Title :
Uncovering Cloaking Web Pages with Hybrid Detection Approaches
Author :
Jun Deng ; Hao Chen ; Jianhua Sun
Author_Institution :
Coll. of Inf. Sci. & Eng., Hunan Univ., Changsha, China
Abstract :
Web search cloaking, used by spammers for the purpose of increasing the visiting rates of their website, is a challenging spamming technique to search engines. Existing cloaking detection systems have some shortcomings: the accuracy of their algorithms is not high enough, the types of cloaking techniques that be detected are limited. In this paper, we present a new system to attack these two problems. To improve the detection accuracy, our algorithm combines text, tag and URL based method. For the purpose of detecting more types of cloaking techniques, our system works as follows: driving a real browser to execute scripts in web pages, crawl a page for the second time by modifying the referrer field of our HTTP headers, obtaining search engine´s cached page for further comparison. We apply our system to 104,800 URLs extracted from Yahoo. Results show that our system can gain a high accuracy: precision at 94.52% and recall at 98.57%. More types of cloaking techniques are successfully detected by our system.
Keywords :
Web sites; information retrieval; search engines; security of data; unsolicited e-mail; HTTP header; URL based method; Web pages cloaking; Web search cloaking; Web site; cloaking detection system; hybrid detection approach; search engines; spamming technique; tag based method; text based method; Accuracy; Browsers; Crawlers; HTML; IP networks; Search engines; Web pages; Cloak; Cloaking Techniques; SEO; search terms; similarity detection algorithm;
Conference_Titel :
Computational and Business Intelligence (ISCBI), 2013 International Symposium on
Conference_Location :
New Delhi
Print_ISBN :
978-0-7695-5066-4
DOI :
10.1109/ISCBI.2013.65