DocumentCode :
3399059
Title :
Link farm detection using SVMLight tool
Author :
Saraswathi, D. ; Kathiravan, A. Vijaya ; Kavitha, R.
Author_Institution :
K.S. Rangasamy Coll. of Arts & Sci., Namakkal, India
fYear :
2012
fDate :
10-12 Jan. 2012
Firstpage :
1
Lastpage :
5
Abstract :
Search Engine spam is a web page or a portion of a web page which has been created with the intention of increasing its ranking in search engines. Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Anyone who uses a search engine frequently has most likely encountered a high ranking page that consists of nothing more than a bunch of query keywords. These pages detract both from the user experience and from the quality of the search engine. Search engine spam is a webpage that has been designed to artificially inflating its search engine ranking. Recently this search engine spam has been increased dramatically and creates problem to the search engine and the web surfer. It degrades the search engine´s results, occupies more memory and consumes more time for creating indexes, and frustrates the user by giving irrelevant results. Search engines have tried many techniques to filter out these spam pages before they can appear on the query results page. In this paper, various ways of creating spam pages, a collection of current methods that are being used to detect spam, and a new approach to build a tool for link spam detection that uses machine learning as a means for detecting spam. This new approach uses SVMLight tool to detect the link spam which only considers the link structure of Web, regardless of page contents. These statistical features are used to build a classifier that is tested over a large collection of Web link spam. The link farm can identify based on degree Hub and Authorities of link. The spam classifier makes use of the Wordnet word database and SVMLight tool to classify web links as either spam or not spam. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.
Keywords :
Web sites; learning (artificial intelligence); pattern classification; search engines; security of data; support vector machines; SVMLight tool; Web link spam detection; Web link structure; Web page; Web spamming; Web surfer; Wordnet word database; hub degree; link authority; link farm detection; machine learning; search engine ranking; search engine spam; spam classifier; statistical feature; Conferences; Crawlers; Informatics; Search engines; Unsolicited electronic mail; Web pages; Classification; Click Spam; Cloaking; Link Farm; PageRank; Search engine; Spamdexing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Communication and Informatics (ICCCI), 2012 International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4577-1580-8
Type :
conf
DOI :
10.1109/ICCCI.2012.6158833
Filename :
6158833
Link To Document :
بازگشت