DocumentCode :
1995444
Title :
MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis
Author :
Ibrahim, Hossam S. ; Abdou, Sherif M. ; Gheith, Mervat
Author_Institution :
Comput. Sci. Dept., Inst. of Stat. studies & Res. (ISSR) Cairo Univ., Cairo, Egypt
fYear :
2015
fDate :
9-11 July 2015
Firstpage :
353
Lastpage :
358
Abstract :
Sentiment analysis (SA) and opinion mining (OM) becomes a field of interest that fueled the attention of research during the last decade, due to the rise of the amount of internet documents (especially online reviews and comments) on the social media such as blogs and social networks. Many attempts have been conducted to build a corpus for SA, due to the consideration of importance of building such resource as a key factor in SA and OM systems. But the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present MIKA a multi-genre tagged corpus of modern standard Arabic (MSA) and colloquial. MIKA is manually collected and annotated at sentence level with semantic orientation (positive or negative or neutral). A number of rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases and others are used for the annotation process. Our data focus on MSA and Egyptian dialectal Arabic. We report the efforts of manually building and annotating our sentiment corpus using different types of data, such as tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).
Keywords :
Internet; data mining; natural language processing; social networking (online); text analysis; Arabic microblogs; Egyptian dialectal Arabic; Internet documents; MIKA; colloquial sentiment analysis; contextual intensifiers; contextual shifter; linguistically motivated features; modern standard Arabic sentiment analysis; morphologically-rich language; multigenre tagged corpus; negation handling; opinion mining; social media; syntactic features; Blogs; Data mining; Internet; Pragmatics; Sentiment analysis; Standards; Syntactics; Arabic corpuse; opinion mining; polarity strength; sentiment analysis; sentiment polarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location :
Kolkata
Type :
conf
DOI :
10.1109/ReTIS.2015.7232904
Filename :
7232904
Link To Document :
بازگشت