• DocumentCode
    118347
  • Title

    Sentiment analysis of Chinese micro-blog using vector space model

  • Author

    Zhi-Qiang Xiang ; Zou, Y.X. ; Xin Wang

  • Author_Institution
    Sch. of Electron. Comput. Eng., Peking Univ., Shenzhen, China
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    In recent years, mining micro-blog becomes a hot research field, especially it may create commercial and political values in a fast changing big data era. This paper investigates the sentiment analysis of Chinese micro-blogs (SACM) using a vector space model. With the analysis of the nature properties of the Chinese micro-blogs, a sentiment analysis system has been proposed by formulating it as a two-type classification problem whether positive sentiment or negative sentiment. To achieve robust results, a preprocessing approach has been developed to remove the emotional unrelated words, transform the traditional expression to simplified one, and unify the punctuation by analyzing the dynamic and complicated micro-blog expressions. Besides, with aids of word segmentation and frequency statistical techniques the vector space model has been formed to generate the sentiment-related micro-blog feature vector. The support vector machine (SVM) has been taken as the classifier for its excellent ability in solving two-class classification problem. Experiments have been carried out to evaluate the proposed sentiment analysis system. Three different databases have been used in word segmentation stage including the emotion dictionary from Dalian University of Technology, CNKI-Hownet emotional dictionary and our self-established dictionary. Experimental results show that the proposed SACM system is able to achieve 80.86% classification accuracy using above databases.
  • Keywords
    Web sites; data mining; information analysis; pattern classification; support vector machines; Big Data; Chinese microblog; SACM; SVM; emotion dictionary; microblog expression; microblog mining; negative sentiment; positive sentiment; preprocessing approach; sentiment analysis system; support vector machine; two-type classification problem; vector space model; word segmentation stage; Blogs; Classification algorithms; Databases; Dictionaries; Feature extraction; Sentiment analysis; Support vector machines; Chinese micro-blogs; classification; sentiment analysis; support vector machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
  • Conference_Location
    Siem Reap
  • Type

    conf

  • DOI
    10.1109/APSIPA.2014.7041745
  • Filename
    7041745