DocumentCode :
3700095
Title :
Evaluation of text classification techniques for inappropriate web content blocking
Author :
Igor Kotenko;Andrey Chechulin;Dmitry Komashinsky
Author_Institution :
Laboratory of Computer Security Problems of St. Petersburg Institute for Informatics and Automation 39, 14th Liniya, St. Petersburg, Russia
Volume :
1
fYear :
2015
Firstpage :
412
Lastpage :
417
Abstract :
The paper is devoted to the issues of automated categorization of textual information which can be applied in the systems intended to block inappropriate content. The approach used for feature selection and construction is proposed. The text mining methods used for research (Decision Tree classifiers) are analyzed. Besides that, the techniques of Web sites analysis that provide information in different languages are suggested. The aspects of collection and analysis of text features required for classification in certain categories are investigated. Results of experiments on analysis of text correspondence to different categories are given. The classification quality is evaluated. The text classification component, developed as a result of this paper, is intended for realization in F-Secure systems aiming to block inappropriate web content.
Keywords :
"Text categorization","Dictionaries","Measurement","Feature extraction","Training","Decision trees","Internet"
Publisher :
ieee
Conference_Titel :
Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), 2015 IEEE 8th International Conference on
Print_ISBN :
978-1-4673-8359-2
Type :
conf
DOI :
10.1109/IDAACS.2015.7340769
Filename :
7340769
Link To Document :
بازگشت