Title :
A comparison of two text representations for sentiment analysis
Author :
Wang, Jianxiong ; Dong, Andy
Author_Institution :
Sch. of Comput. Sci. & Educ. Software, Guangzhou Univ., Guangzhou, China
Abstract :
This paper compares two representations of text within the same experimental setting for sentiment orientation analysis, and in particular focuses on the sensitivity of the analysis to sentence length. The two representations compared in this paper are bag-of-words (BoW) and nine dimensional vector (9Dim). The former represents text with a high dimensional feature vector, which ignores grammatical structure and is lexicon-dependent. In contrast, the 9Dim representation encodes grammatical knowledge of clauses in sentences into a compact nine dimensional vector, which is lexicon-independent. Text is composed by multiple sentences since the grammatical structure of a single sentence or clause may not provide sufficient information for sentiment orientation classification. A convenient way to enrich grammatical knowledge in a text is to compose the text with multi-sentences, thereby lengthening the sample. We consider the length of text is an important factor in text classification. The aim of this paper is to demonstrate how text sentiment orientation classifiers´ performance is improved when the length of the sentence comprising a training vector is varied. The experimental results indicated that the accuracy of the classifiers benefits from the increasing of the text´s length, and the results also illustrated that the 9Dim method can provide comparable results to BoW under the same sentiment classification algorithm, support vector machines (SVM).
Keywords :
knowledge representation; pattern classification; support vector machines; text analysis; compact nine dimensional vector; grammatical knowledge; grammatical structure; high dimensional feature vector; multisentences text classification; sentiment analysis; sentiment classification algorithm; sentiment orientation classification; support vector machines; text representation; text sentiment orientation classifiers; Classification algorithms; Complexity theory; Semantics; Support vector machine classification; Text categorization; Training; 9Dim; bag-of-words; sentiment analysis; text representations;
Conference_Titel :
Computer Application and System Modeling (ICCASM), 2010 International Conference on
Conference_Location :
Taiyuan
Print_ISBN :
978-1-4244-7235-2
Electronic_ISBN :
978-1-4244-7237-6
DOI :
10.1109/ICCASM.2010.5623265