DocumentCode :
1876547
Title :
Classification of Chinese-to-English translated social network timelines using naive Bayes
Author :
Xiang-Ru Yu ; Zhong-Liang Xiang ; Dae-Ki Kang
Author_Institution :
Comput. Software Inst., Weifang Univ. of Sci. & Technol., Shouguang, China
fYear :
2015
fDate :
1-3 July 2015
Firstpage :
296
Lastpage :
299
Abstract :
This study proposes a method that classifies Chinese social network positive-negative comments (Weibo) using naive Bayes algorithm trained from English social network (Twitter) corpus. We train our text classifier using Twitter corpus (in English language), and use this classifier to classify Chinese text. In the previous research, Chinese sentences are processed using Chinese word segmentation algorithms before the application of machine learning algorithm. Chinese word segmentation algorithms split Chinese sentences into a series of words since a Chinese word consists of several Chinese characters unlike English sentences. Therefore, the quality of word segmentation algorithm obviously influences the accuracy of Chinese text categorization problems. In our research, we eliminate Chinese word segmentation stage (a traditional preprocessing stage of Chinese text classification) to avoid the effect on the quality of segmentation algorithms. Instead of Chinese word segmentation processing, we translate Chinese text into English text via Google translator. Based on Twitter corpus, we directly generate a text classifier by using naive Bayes multinomial algorithm. Finally, the text classifier classifies a new Chinese text (a Weibo text, which has been translated into English by Google translation at preprocessing stage). We conduct an experiment comparing the performance of naive Bayes multinomial algorithm and C4.5 in terms of accuracy.
Keywords :
Bayes methods; language translation; learning (artificial intelligence); natural language processing; pattern classification; social networking (online); text analysis; Chinese characters; Chinese sentence processing; Chinese social network positive-negative comments; Chinese text categorization; Chinese text classification; Chinese text translation; Chinese word segmentation algorithm; Chinese-to-English translated social network timeline classification; English sentences; English social network corpus; Google translator; Twitter corpus; Weibo text; machine learning algorithm; naive Bayes multinomial algorithm; text classifier training; Algorithm design and analysis; Classification algorithms; Computer science; Hidden Markov models; Machine learning algorithms; Text categorization; Twitter; Classification; Comment; Multinomial model; Naive Bayes; Text categorization; Weibo;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Communication Technology (ICACT), 2015 17th International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-8-9968-6504-9
Type :
conf
DOI :
10.1109/ICACT.2015.7224807
Filename :
7224807
Link To Document :
بازگشت