Title :
Generating New Features Using Genetic Programming to Detect Link Spam
Author :
Shengen, Li ; Xiaofei, Niu ; Peiqi, Li ; Lin, Wang
Author_Institution :
Sch. of Comput. Sci. & Technol., Shandong Jianzhu Univ., Jinan, China
Abstract :
Link spam techniques can enable some pages to achieve higher-than-deserved rankings in the results of a search engine. They negatively affect the quality of search results. Classification methods can detect link spam. For classification problem, features play an important role. This paper proposes to derive new features using genetic programming from existing link-based features and use the new features as the inputs to SVM and GP classifiers for the identification of link spam. Experiments on WEBSPAM-UK2006 show that the classification results of the classifiers that use 10 newly generated features are much better than those of the classifiers that use original 41 link-based features and equivalent to those of the classifiers that use 138 transformed link-based features. The newly generated features can improve the link spam classification performance.
Keywords :
Internet; feature extraction; genetic algorithms; information retrieval; pattern classification; search engines; support vector machines; GP classifier; SVM; WEBSPAM-UK2006; classification method; genetic programming; link spam detection; link-based feature generation; search engine; search result quality; Accuracy; Binary trees; Feature extraction; Genetic programming; Support vector machines; Unsolicited electronic mail; Web pages; Feature Generation; Genetic Programming; Link Spam;
Conference_Titel :
Intelligent Computation Technology and Automation (ICICTA), 2011 International Conference on
Conference_Location :
Shenzhen, Guangdong
Print_ISBN :
978-1-61284-289-9
DOI :
10.1109/ICICTA.2011.41