• DocumentCode
    2260210
  • Title

    Sentiment classification for Chinese reviews based on key substring features

  • Author

    Zhai, Zhongwu ; Xu, Hua ; Li, Jun ; Jia, Peifa

  • Author_Institution
    CS&T Dept., Tsinghua Univ., Beijing, China
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    One of the most widely-studied sub-problems of opinion mining is sentiment classification, which classifies evaluative texts as positive or negative to help people automatically identify the viewpoints underlying the online user-generated information. Most of the existing methods for sentiment classification ignore word sequence and unlabeled test documents´ structural information. This paper proposes a transductive learning based algorithm considering both of these two types of information. The proposed algorithm is implemented by firstly selecting key substrings in the suffix tree constructed from the strings in training and unlabeled test documents and then converting each original text document to a bag of numbers of the key substrings. Finally, SVM is employed to classify the converted documents. Experiments on the open dataset (16,000 Chinese reviews) demonstrate promising performance of the proposed algorithm, the accuracy being over 93.15%, which is much better than the performance of the existing sentiment classification methods, such as n-gram features based classification algorithms. Experimental results also show that ldquotfidf-crdquo performs much better than other term weighting approaches in sentiment classification for large text corpus. In particular, the reasons behind the proposed algorithm´s outstanding performance are further studied and analyzed in this paper. Moreover, the proposed algorithm can avoid the messy and rather artificial problem of defining word boundaries in Chinese language.
  • Keywords
    data mining; learning (artificial intelligence); natural language processing; pattern classification; text analysis; trees (mathematics); Chinese language; Chinese reviews; evaluative text classification; key substring features; opinion mining; sentiment classification; suffix tree; text document; tfidf-c; transductive learning based algorithm; word boundaries; Algorithm design and analysis; Classification algorithms; Data mining; Intelligent systems; Laboratories; Learning systems; Machine learning; Support vector machine classification; Support vector machines; Testing; Opinion Mining; Sentiment Classification; Substring; Transductive Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313782
  • Filename
    5313782