DocumentCode :
3508846
Title :
Content Based Spam Text Classification: An Empirical Comparison between English and Chinese
Author :
Liumei Zhang ; Jianfeng Ma ; Yichuan Wang
Author_Institution :
Sch. of Comput. Sci. & Technol., Xi´an Shiyou Univ., Xi´an, China
fYear :
2013
fDate :
9-11 Sept. 2013
Firstpage :
69
Lastpage :
76
Abstract :
Spam text including e-mails, SMS and etc, is a real and growing problem primarily due to the availability of digital handset and internet. To filter spam text is to be the utmost topic over varies study area. Text bodies of different forms of communication expose channel for spammers. In this study, text dataset in English and Chinese are pre-processed. Classical classifiers are applied on the pre-processed dataset to evaluate the accuracy of the same classifier. The behavior of classifiers among English and Chinese is evaluated. The paper also discussed the result of experiments. In addition, different from most existing text spam detection methods which are based on English, classifiers suited for English text classification is insufficient for Chinese text classification.
Keywords :
information filtering; natural language processing; pattern classification; text analysis; unsolicited e-mail; Chinese text classification; Chinese text dataset; English text classification; English text dataset; classical classifiers; content based spam text classification; spam text filtering; text spam detection methods; Accuracy; Educational institutions; Entropy; Frequency measurement; Training; Unsolicited electronic mail; classification; datamining; machine learning; spam text;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Networking and Collaborative Systems (INCoS), 2013 5th International Conference on
Conference_Location :
Xi´an
Type :
conf
DOI :
10.1109/INCoS.2013.21
Filename :
6630291
Link To Document :
بازگشت