DocumentCode :
1850300
Title :
Detection Method of Blog Spam Based on Categorization and Time Series Information
Author :
Teraguchi, Toshio ; Nakamura, Kenji ; Tanaka, Shigenori ; Kitano, Koichi
Author_Institution :
Grad. Sch. of Inf., Kansai Univ., Suita, Japan
fYear :
2012
fDate :
26-29 March 2012
Firstpage :
801
Lastpage :
808
Abstract :
Recently, a blog is well known as a tool for transmitting information easily. However, as the blog spam increases, the method of filtering spam is required to be efficient. In early researches, the method for detecting spam mails with Bayesian filter detects the spam from the characteristics of spam words appearing in spam mails with a high degree of accuracy. How-ever, there are some problems in applying the Bayesian filter to the blog spam detection. First, it takes a lot of man-hours to keep high accuracy continuously. Second, the accuracy of spam detection decreases because there are too many various words in a blog and the relative number of word occurrences de-creases. Furthermore, we have to consider the time when each word occurred. Therefore, in this paper, we acquire information to update the judgment information automatically and calculate spam probability of words with every category to cope with these problems. In addition, we use the time-series information to revise spam probability of words to cope with the problem that the words that occur change over time. With these countermeasures, we propose a method for detecting a new blog spam. With comparative experiments, the present method is better adapted to any existing method.
Keywords :
Web sites; information filtering; probability; time series; unsolicited e-mail; blog spam detection method; information acquisition; judgment information; spam categorization; spam filtering method; spam mails; time series information; word spam probability; Blogs; Dictionaries; Equations; Mathematical model; Probability; Time series analysis; Unsolicited electronic mail; blog; category classification; filtering; spam detection; time series information;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Information Networking and Applications Workshops (WAINA), 2012 26th International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4673-0867-0
Type :
conf
DOI :
10.1109/WAINA.2012.217
Filename :
6185493
Link To Document :
بازگشت