Title :
Content-Based Text Classifiers for Pornographic Web Filtering
Author :
Polpinij, Jantima ; Chotthanom, Anirut ; Sibunruang, Chumsak ; Chamchong, Rapeeporn ; Puangpronpitag, Somnuk
Author_Institution :
Mahasarakham Univ., Mahasarakham
Abstract :
Due to the flood of pornographic web sites on the internet, effective Web filtering systems are essential. Web filtering based on content has become one of the important techniques to handle and filter inappropriate information on the web. We examine two machine learning algorithms (support vector machines and Naive Bayes) for pornographic web filtering based on text content. We then focus initially on Thai-language and English-language web sites. In this paper, we aim to investigate whether machine learning algorithms are suitable for web sites classification. The empirical results show that the classifier based support vector machines are more effective for pornographic web filtering than Naive Bayes classifier after testing, especially an effectiveness for the over-blocking problem.
Keywords :
Bayes methods; Internet; content-based retrieval; information filtering; learning (artificial intelligence); natural languages; support vector machines; text analysis; Internet; Naive Bayes classifier; content-based text classifier; machine learning algorithm; pornographic Web filtering system; pornographic Web site; support vector machine; Cybernetics; Information filtering; Information filters; Internet; Machine learning algorithms; Support vector machine classification; Support vector machines; Text categorization; Uniform resource locators; Web pages; Naïve Bayes; Pornographic web filtering; Support Vector Machines; Text Classification;
Conference_Titel :
Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
1-4244-0099-6
Electronic_ISBN :
1-4244-0100-3
DOI :
10.1109/ICSMC.2006.384926