DocumentCode :
124215
Title :
A Statistical and Evolutionary Approach to Sentiment Analysis
Author :
Carvalho, Julien ; Prado, Adriana ; Plastino, Alexandre
Author_Institution :
Dept. of Comput. Sci., Univ. Fed. Fluminense, Niteroi, Brazil
Volume :
2
fYear :
2014
fDate :
11-14 Aug. 2014
Firstpage :
110
Lastpage :
117
Abstract :
In the past years, the Web has become a huge source of opinionative data. Social media, such as Twitter, are regarded as public diaries, where millions of people express their sentiments and opinions in their daily interaction. One of the biggest challenges in the analysis of such data, is the classification of their polarity, that is, whether they carry a positive or negative connotation. For this purpose, statistical methods have been inspired by the observation that if two words frequently appear together within the same context, they are likely to have the same polarity. Consequently, the polarity of a word can be determined by calculating its relative frequency of co-occurrence with special words, called paradigm words, whose polarities are invariantly preserved (e.g., "good" and "bad"). In this way, one can classify, for example, a tweet as carrying a positive polarity, if the majority of its words is more strongly associated with the word "good" than with the word "bad". In current statistical approaches, such paradigm words have been selected following different criteria, without any prior evaluation. Motivated by this observation, we propose to classify tweets via a statistical method where the paradigm words are selected by means of a genetic algorithm. This algorithm explores a set of paradigm words to find a subset of such words that leads to a significant improvement of the classification accuracy. Additionally, we believe that an appropriate set of paradigm words may vary according to the data domain. For example, paradigm words applied in the classification of tweets in the domain of movies may not be convenient to classify tweets related to products. We prove this assumption empirically, using tweets from different domains, and show that our approach deals properly with this problem.
Keywords :
genetic algorithms; pattern classification; social networking (online); statistical analysis; text analysis; Twitter; evolutionary approach; genetic algorithm; paradigm words; sentiment analysis; statistical method; tweets classification; Accuracy; Biological cells; Equations; Feature extraction; Genetic algorithms; Semantics; Statistical analysis; Twitter; genetic algorithm; opinion mining; sentiment analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
Type :
conf
DOI :
10.1109/WI-IAT.2014.87
Filename :
6927614
Link To Document :
بازگشت