DocumentCode
708161
Title
Multi-way sentiment classification of Arabic reviews
Author
Al Shboul, Bashar ; Al-Ayyouby, Mahmoud ; Jararwehy, Yaser
Author_Institution
Carleton Univ., Ottawa, ON, Canada
fYear
2015
fDate
7-9 April 2015
Firstpage
206
Lastpage
211
Abstract
The evolution of the Web and the appearance of new technologies led to the rise of new ways for the Internet users to express their opinions and feelings regarding different aspects of life. Such expressions are written in an unstructured way using natural languages. They hold a great deal of knowledge about the user´s opinions and reactions on various subjects. As a result, a new field called Sentiment Analysis (SA) has come into existence to address the complicated task of extracting such opinions or sentiments from the massive pool of unstructured text available online. Traditional works on SA consider only two sentiments: positive and negative. Multi-way SA sentiment analysis consider sentiments expressed using a star or ranking system. E.g., in a 5-star ranking system, the user´s opinion ranges from very negative (1 star) to very positive (5 stars). This version of SA is obviously much harder to handle which partly explains the limited number of works on it. Moreover, we focus in this work on the Arabic language, which is largely understudied compared to the English language. In this work, a new and relatively large Arabic dataset is used. The dataset, called the Large Arabic Book Reviews (LABR) dataset, is gathered from an online book reviews website. The objective of this work is to perform baseline experiments on this dataset by applying the Bag-Of-Words words coupled with the most popular classifiers. We also investigate the effect of stemming and balancing the dataset. The obtained accuracies are low confirming the intuition that the multi-way SA problem is very difficult and needs further attention.
Keywords
Web sites; classification; data mining; natural language processing; social sciences computing; text analysis; Arabic language; Internet users; LABR dataset; Web evolution; bag-of-words; dataset balancing; dataset stemming; large Arabic book reviews dataset; multiway SA problem; multiway sentiment classification; natural languages; negative sentiments; online book reviews Website; opinion mining; positive sentiments; sentiment analysis; star ranking system; unstructured text; user opinions; user reactions; Accuracy; Internet; Market research; Niobium; Sentiment analysis; Support vector machines; Arabic Text; Bag-Of-Words; Decision Tree; K-Nearest Neighbor; Multi-Way Sentiment Analysis; Naive Bayes; Voting;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Systems (ICICS), 2015 6th International Conference on
Conference_Location
Amman
Type
conf
DOI
10.1109/IACS.2015.7103228
Filename
7103228
Link To Document