• DocumentCode
    189134
  • Title

    Sentiment Categorization on a Creole Language with Lexicon-Based and Machine Learning Techniques

  • Author

    Rios, Adolfo A. ; Amarilla, Pedro J. ; Gimenez Lugo, Gustavo A.

  • Author_Institution
    Fac. Politec., Univ. Nac. de Asuncion, San Lorenzo, Paraguay
  • fYear
    2014
  • fDate
    18-22 Oct. 2014
  • Firstpage
    37
  • Lastpage
    43
  • Abstract
    We propose polarity detection from colloquial expressions distinctive of a bilingual population. The hybrid language we address it\´s called "Jopara", composed by Spanish and Guaraní, spoken in Paraguay, similar to the "Louisiana\´s Creole" in the United States. We categorize polarity in three classes (positive, negative and neutral) and address this problem by applying both lexicon-based and machine-learning approaches. In this document it\´s shown the application scenario, the building process of the bilingual lexicon and the attributes preprocessing to create the classifiers\´ input. The input data is retrieved from Twitter so the expressions are similar to natural language. Finally, results are displayed to compare performance of these techniques when applied on this kind of language. It\´s shown that classical classifiers have very good performances, with correction rates of over 80% even with small training sets, if their parameters are properly adjusted along with an adequate selection of attributes.
  • Keywords
    learning (artificial intelligence); natural language processing; pattern classification; social networking (online); Guaraní; Jopara; Louisiana Creole language; Paraguay; Spanish; Twitter; United States; attribute preprocessing; attribute selection; bilingual lexicon; bilingual population; classifier input data retrieval; colloquial expressions; correction rates; hybrid language; lexicon-based technique; machine learning technique; natural language; negative class; neutral class; polarity detection; positive class; sentiment categorization; training sets; Classification algorithms; Companies; Kernel; Sentiment analysis; Support vector machine classification; Training; cross-lingual issues; emotion detection; lexical resources; micro blogging; multi-lingual;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (BRACIS), 2014 Brazilian Conference on
  • Conference_Location
    Sao Paulo
  • Type

    conf

  • DOI
    10.1109/BRACIS.2014.18
  • Filename
    6984804