DocumentCode :
2349003
Title :
Research on sentiment classification of Blog based on PMI-IR
Author :
Duan, Xiuting ; He, Tingting ; Song, Le
Author_Institution :
Dept. of Comput. Sci., Huazhong Normal Univ., Wuhan, China
fYear :
2010
fDate :
21-23 Aug. 2010
Firstpage :
1
Lastpage :
6
Abstract :
Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.
Keywords :
Web sites; information retrieval; pattern classification; text analysis; unsupervised learning; Chinese text classification; PMI-IR algorithm; blog corpus; blog texts information; information retrieval; max semantic orientation; point-wise mutual information; sentiment classification; unsupervised learning algorithm; Classification algorithms; Mutual Information; PMI-IR Algorithm; Semantic Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
Type :
conf
DOI :
10.1109/NLPKE.2010.5587849
Filename :
5587849
Link To Document :
بازگشت