DocumentCode :
708161
Title :
Multi-way sentiment classification of Arabic reviews
Author :
Al Shboul, Bashar ; Al-Ayyouby, Mahmoud ; Jararwehy, Yaser
Author_Institution :
Carleton Univ., Ottawa, ON, Canada
fYear :
2015
fDate :
7-9 April 2015
Firstpage :
206
Lastpage :
211
Abstract :
The evolution of the Web and the appearance of new technologies led to the rise of new ways for the Internet users to express their opinions and feelings regarding different aspects of life. Such expressions are written in an unstructured way using natural languages. They hold a great deal of knowledge about the user´s opinions and reactions on various subjects. As a result, a new field called Sentiment Analysis (SA) has come into existence to address the complicated task of extracting such opinions or sentiments from the massive pool of unstructured text available online. Traditional works on SA consider only two sentiments: positive and negative. Multi-way SA sentiment analysis consider sentiments expressed using a star or ranking system. E.g., in a 5-star ranking system, the user´s opinion ranges from very negative (1 star) to very positive (5 stars). This version of SA is obviously much harder to handle which partly explains the limited number of works on it. Moreover, we focus in this work on the Arabic language, which is largely understudied compared to the English language. In this work, a new and relatively large Arabic dataset is used. The dataset, called the Large Arabic Book Reviews (LABR) dataset, is gathered from an online book reviews website. The objective of this work is to perform baseline experiments on this dataset by applying the Bag-Of-Words words coupled with the most popular classifiers. We also investigate the effect of stemming and balancing the dataset. The obtained accuracies are low confirming the intuition that the multi-way SA problem is very difficult and needs further attention.
Keywords :
Web sites; classification; data mining; natural language processing; social sciences computing; text analysis; Arabic language; Internet users; LABR dataset; Web evolution; bag-of-words; dataset balancing; dataset stemming; large Arabic book reviews dataset; multiway SA problem; multiway sentiment classification; natural languages; negative sentiments; online book reviews Website; opinion mining; positive sentiments; sentiment analysis; star ranking system; unstructured text; user opinions; user reactions; Accuracy; Internet; Market research; Niobium; Sentiment analysis; Support vector machines; Arabic Text; Bag-Of-Words; Decision Tree; K-Nearest Neighbor; Multi-Way Sentiment Analysis; Naive Bayes; Voting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Systems (ICICS), 2015 6th International Conference on
Conference_Location :
Amman
Type :
conf
DOI :
10.1109/IACS.2015.7103228
Filename :
7103228
Link To Document :
بازگشت