Title :
Detecting spam webpages through topic and semantics analysis
Author :
Jing Wan;Mufan Liu;Junkai Yi;Xuechao Zhang
Author_Institution :
Beijing University of Chemical Technology, China
fDate :
6/1/2015 12:00:00 AM
Abstract :
Spam web pages have posed great challenges to the development of search engines. The content spam is among the commonly used. Along with the development of Internet technologies, the content spam is difficult to detect. The current detection methods for the web page using content spam technique primarily rely on the statistical features, which has obvious limitations. In this article, a spam webpage detection method based on topic and semantics was proposed, with the use of two categories of features, namely, semantics and statistics. Topic modeling was first performed over the contents of the webpage, with the webpage contents mapped into the topic space. This was followed by semantic analysis and calculation in the topic space according to the distribution of topics. Semantic features were extracted for the classification of webpages by combining with the statistical features. The results verified that the proposed method can achieve a better effect.
Keywords :
"Semantics","Feature extraction","Analytical models","Search engines","Algorithm design and analysis","Mathematical model","Internet"
Conference_Titel :
Computer & Information Technology (GSCIT), 2015 Global Summit on
Print_ISBN :
978-1-4673-6586-4
DOI :
10.1109/GSCIT.2015.7353328