DocumentCode :
3093964
Title :
A Web Text Filter Based on Rough Set Weighted Bayesian
Author :
Wu, Yu ; She, Kun ; Zhu, Williams ; Yue, Xiaojun ; Luo, Huiqiong
Author_Institution :
Sch. of Comput. Sci. & Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
fYear :
2009
fDate :
12-14 Dec. 2009
Firstpage :
241
Lastpage :
245
Abstract :
With the deep penetration of the Internet, uncontrolled flood of information has become one of the most serious problems to Internet users. Harmful contents about pornography, violence and other illegal messages, etc have posed serious influence to the whole society, especially to the young people. In this paper, a novel Web text filter based on rough set and Bayesian theory is proposed to analysis text content of Web pages to filter harmful pages. Some of current feature selection methods such as inverse document frequency (IDF) does not take the classification information into account. To avoid this shortcoming rough set is used to reduce original feature terms. Meanwhile, a novel coefficient weighted method based on rough set is proposed and introduced into Bayesian formula, which will greatly improve filtering performance. In the final experiment, this paper compared the novel method with other weighted methods applied in Bayesian formula, such as Tf, IDF and TFIDF. The results demonstrate that this novel filter works efficiently.
Keywords :
Bayes methods; Internet; information filtering; rough set theory; text analysis; Internet; Web pages; Web text filter; coefficient weighted method; feature selection methods; inverse document frequency; rough set weighted Bayesian theory; text content anaysis; Bayesian methods; Feature extraction; Filtering theory; Information filtering; Information filters; Internet; Probability; Set theory; Uniform resource locators; Web pages; Baysian theory; Rough set; web text filter;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable, Autonomic and Secure Computing, 2009. DASC '09. Eighth IEEE International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-0-7695-3929-4
Electronic_ISBN :
978-1-4244-5421-1
Type :
conf
DOI :
10.1109/DASC.2009.38
Filename :
5380357
Link To Document :
بازگشت