شماره ركورد كنفرانس :
3296
عنوان مقاله :
Translation Is Not Enough: Comparing Lexiconbased Methods for Sentiment Analysis in Persian
عنوان به زبان ديگر :
Translation Is Not Enough: Comparing Lexiconbased Methods for Sentiment Analysis in Persian
پديدآورندگان :
Ehsan Basiri Mohammad Department of Computer Engineering Shahrekord University Shahrekord - Iran , Kabiri Arman Department of Computer Engineering Shahrekord University Shahrekord - Iran
كليدواژه :
Data Mining , Opinion mining , Lexicon-based approach , Persian Language , Natural Language Processing , Sentiment Analysis , component
عنوان كنفرانس :
هجدهمين سمپوزيوم بين المللي علوم كامپيوتر و مهندسي نرم افزار
چكيده لاتين :
Abstract—Sentiment analysis is a subfield of data mining and
natural language processing with the aim of extracting people’s
opinion and appraisals from their comments on the Web.
Contrary to machine learning approach, lexicon-based methods
have some important advantages like domain-independency and
being needless of a large annotated training corpus and hence are
faster. This makes lexicon-based approach prevalent in the
sentiment analysis community. However, for Persian language, in
contrast to English, using lexicon-based method is a new discipline.
There are limited lexicons available for sentiment analysis in
Persian, almost all of them are directly translated from English. In
the current study, four lexicons are compared to show the
importance of lexicons in the performance of document-level
sentiment analysis. Specifically, the Persian version of NRC
lexicon, SentiStrength, CNRC, and Adjectives are compared in a
pure lexicon-based scenario. Experiments are carried out on the
document-level edition of SPerSent dataset. Results show that
direct translation used in NRC leads the poorest performance
while pre-processing and refining lexicons used in SentiStrength
and CNRC improves the performance. Also, the results show that
using just adjectives leads to higher results in comparison to using
NRC.