Title :
Using Inter-comment Similarity for Comment Spam Detection in Chinese Blogs
Author :
Wang, Jenq-Haur ; Lin, Ming-Sheng
Author_Institution :
Nat. Taipei Univ. of Technol., Taipei, Taiwan
Abstract :
Blog has become one of the most popular ways of communication among social communities since blog posts can be replied, commented, and even shared to other users in a convenient way. All posts and comments, no matter good or bad, have to be manually coordinated by blog owners. In order to prevent comment spam, most blog sites provide challenge-response tests such as CAPTCHA to ensure that the response is from human, instead of automatically generated by a computer. However, these tests cannot prohibit spammers from manually leaving spam messages. Existing studies of Chinese blog comment spam only focus on comments containing hyperlinks, which only stand for a small portion of blog comment spam. In this paper, we propose to include inter-comment Jaccard similarity in the features in addition to the post-comment similarity, stop words ratio, and comment length for blog comment classification. In order to verify the effects of inter-comment similarity features, we compared several classification algorithms such as C4.5, Naïve Bayes, and Neural Network. Experimental results showed that the feature combination of inter-comment and post-comment similarity under the classification of C4.5 achieves the best performance. This shows the effectiveness of the proposed inter-comment similarity feature for Chinese blog comment spam classification.
Keywords :
Web sites; pattern classification; unsolicited e-mail; C4.5; CAPTCHA; Chinese blog sites; Naive Bayes; blog comment classification; comment length; comment spam detection; comment spam message; hyperlinks; intercomment Jaccard similarity; neural network; post-comment similarity; social communities; stopword ratio; Accuracy; Blogs; Feature extraction; Strontium; Testing; Training; Unsolicited electronic mail; blog comment; comment spam detection; inter-comment similarity; short document classification;
Conference_Titel :
Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-61284-758-0
Electronic_ISBN :
978-0-7695-4375-8
DOI :
10.1109/ASONAM.2011.49