Translation Is Not Enough: Comparing Lexiconbased Methods for Sentiment Analysis in Persian

عنوان به زبان ديگر

پديدآورندگان

Ehsan Basiri Mohammad Department of Computer Engineering Shahrekord University Shahrekord - Iran , Kabiri Arman Department of Computer Engineering Shahrekord University Shahrekord - Iran

كليدواژه

Data Mining , Opinion mining , Lexicon-based approach , Persian Language , Natural Language Processing , Sentiment Analysis , component

سال انتشار

آبان 1396

عنوان كنفرانس

هجدهمين سمپوزيوم بين المللي علوم كامپيوتر و مهندسي نرم افزار

چكيده لاتين

Abstract—Sentiment analysis is a subfield of data mining and natural language processing with the aim of extracting people’s opinion and appraisals from their comments on the Web. Contrary to machine learning approach, lexicon-based methods have some important advantages like domain-independency and being needless of a large annotated training corpus and hence are faster. This makes lexicon-based approach prevalent in the sentiment analysis community. However, for Persian language, in contrast to English, using lexicon-based method is a new discipline. There are limited lexicons available for sentiment analysis in Persian, almost all of them are directly translated from English. In the current study, four lexicons are compared to show the importance of lexicons in the performance of document-level sentiment analysis. Specifically, the Persian version of NRC lexicon, SentiStrength, CNRC, and Adjectives are compared in a pure lexicon-based scenario. Experiments are carried out on the document-level edition of SPerSent dataset. Results show that direct translation used in NRC leads the poorest performance while pre-processing and refining lexicons used in SentiStrength and CNRC improves the performance. Also, the results show that using just adjectives leads to higher results in comparison to using NRC.

كشور

ايران

تعداد صفحه 2

از صفحه

تا صفحه

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=36&DC=303552