DocumentCode
124215
Title
A Statistical and Evolutionary Approach to Sentiment Analysis
Author
Carvalho, Julien ; Prado, Adriana ; Plastino, Alexandre
Author_Institution
Dept. of Comput. Sci., Univ. Fed. Fluminense, Niteroi, Brazil
Volume
2
fYear
2014
fDate
11-14 Aug. 2014
Firstpage
110
Lastpage
117
Abstract
In the past years, the Web has become a huge source of opinionative data. Social media, such as Twitter, are regarded as public diaries, where millions of people express their sentiments and opinions in their daily interaction. One of the biggest challenges in the analysis of such data, is the classification of their polarity, that is, whether they carry a positive or negative connotation. For this purpose, statistical methods have been inspired by the observation that if two words frequently appear together within the same context, they are likely to have the same polarity. Consequently, the polarity of a word can be determined by calculating its relative frequency of co-occurrence with special words, called paradigm words, whose polarities are invariantly preserved (e.g., "good" and "bad"). In this way, one can classify, for example, a tweet as carrying a positive polarity, if the majority of its words is more strongly associated with the word "good" than with the word "bad". In current statistical approaches, such paradigm words have been selected following different criteria, without any prior evaluation. Motivated by this observation, we propose to classify tweets via a statistical method where the paradigm words are selected by means of a genetic algorithm. This algorithm explores a set of paradigm words to find a subset of such words that leads to a significant improvement of the classification accuracy. Additionally, we believe that an appropriate set of paradigm words may vary according to the data domain. For example, paradigm words applied in the classification of tweets in the domain of movies may not be convenient to classify tweets related to products. We prove this assumption empirically, using tweets from different domains, and show that our approach deals properly with this problem.
Keywords
genetic algorithms; pattern classification; social networking (online); statistical analysis; text analysis; Twitter; evolutionary approach; genetic algorithm; paradigm words; sentiment analysis; statistical method; tweets classification; Accuracy; Biological cells; Equations; Feature extraction; Genetic algorithms; Semantics; Statistical analysis; Twitter; genetic algorithm; opinion mining; sentiment analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location
Warsaw
Type
conf
DOI
10.1109/WI-IAT.2014.87
Filename
6927614
Link To Document