DocumentCode :
189134
Title :
Sentiment Categorization on a Creole Language with Lexicon-Based and Machine Learning Techniques
Author :
Rios, Adolfo A. ; Amarilla, Pedro J. ; Gimenez Lugo, Gustavo A.
Author_Institution :
Fac. Politec., Univ. Nac. de Asuncion, San Lorenzo, Paraguay
fYear :
2014
fDate :
18-22 Oct. 2014
Firstpage :
37
Lastpage :
43
Abstract :
We propose polarity detection from colloquial expressions distinctive of a bilingual population. The hybrid language we address it\´s called "Jopara", composed by Spanish and Guaraní, spoken in Paraguay, similar to the "Louisiana\´s Creole" in the United States. We categorize polarity in three classes (positive, negative and neutral) and address this problem by applying both lexicon-based and machine-learning approaches. In this document it\´s shown the application scenario, the building process of the bilingual lexicon and the attributes preprocessing to create the classifiers\´ input. The input data is retrieved from Twitter so the expressions are similar to natural language. Finally, results are displayed to compare performance of these techniques when applied on this kind of language. It\´s shown that classical classifiers have very good performances, with correction rates of over 80% even with small training sets, if their parameters are properly adjusted along with an adequate selection of attributes.
Keywords :
learning (artificial intelligence); natural language processing; pattern classification; social networking (online); Guaraní; Jopara; Louisiana Creole language; Paraguay; Spanish; Twitter; United States; attribute preprocessing; attribute selection; bilingual lexicon; bilingual population; classifier input data retrieval; colloquial expressions; correction rates; hybrid language; lexicon-based technique; machine learning technique; natural language; negative class; neutral class; polarity detection; positive class; sentiment categorization; training sets; Classification algorithms; Companies; Kernel; Sentiment analysis; Support vector machine classification; Training; cross-lingual issues; emotion detection; lexical resources; micro blogging; multi-lingual;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems (BRACIS), 2014 Brazilian Conference on
Conference_Location :
Sao Paulo
Type :
conf
DOI :
10.1109/BRACIS.2014.18
Filename :
6984804
Link To Document :
بازگشت