Title :
Sentiment analysis of Chinese micro-blog using vector space model
Author :
Zhi-Qiang Xiang ; Zou, Y.X. ; Xin Wang
Author_Institution :
Sch. of Electron. Comput. Eng., Peking Univ., Shenzhen, China
Abstract :
In recent years, mining micro-blog becomes a hot research field, especially it may create commercial and political values in a fast changing big data era. This paper investigates the sentiment analysis of Chinese micro-blogs (SACM) using a vector space model. With the analysis of the nature properties of the Chinese micro-blogs, a sentiment analysis system has been proposed by formulating it as a two-type classification problem whether positive sentiment or negative sentiment. To achieve robust results, a preprocessing approach has been developed to remove the emotional unrelated words, transform the traditional expression to simplified one, and unify the punctuation by analyzing the dynamic and complicated micro-blog expressions. Besides, with aids of word segmentation and frequency statistical techniques the vector space model has been formed to generate the sentiment-related micro-blog feature vector. The support vector machine (SVM) has been taken as the classifier for its excellent ability in solving two-class classification problem. Experiments have been carried out to evaluate the proposed sentiment analysis system. Three different databases have been used in word segmentation stage including the emotion dictionary from Dalian University of Technology, CNKI-Hownet emotional dictionary and our self-established dictionary. Experimental results show that the proposed SACM system is able to achieve 80.86% classification accuracy using above databases.
Keywords :
Web sites; data mining; information analysis; pattern classification; support vector machines; Big Data; Chinese microblog; SACM; SVM; emotion dictionary; microblog expression; microblog mining; negative sentiment; positive sentiment; preprocessing approach; sentiment analysis system; support vector machine; two-type classification problem; vector space model; word segmentation stage; Blogs; Classification algorithms; Databases; Dictionaries; Feature extraction; Sentiment analysis; Support vector machines; Chinese micro-blogs; classification; sentiment analysis; support vector machine;
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
DOI :
10.1109/APSIPA.2014.7041745