Title :
Comparison on the rule based method and statistical based method on emotion classification for Indonesian Twitter text
Author :
Aldy Rialdy Atmadja;Ayu Purwarianti
Author_Institution :
School of Electrical and Informatics Engineering, Bandung Institute of Technology, Indonesia
Abstract :
In this study, we conducted experiments on emotion classification of Indonesian Twitter text. To conduct such experiments, we built a corpus of labeled Twitter data with size of 7622 Twitter text taken from 69 Twitter accounts, manually labeled by 5 native speakers. We used 6 basic emotion labels (angry, disgust, fear, joy, sad, surprise) and add one label of neutral emotion class. Here, we compared a rule based method with a statistical based method. In the rule based method, we employed the existing Synesketch algorithm with two types of emotion word list: a manually written and a translated WordNet-Affect list. In the statistical based method, we employed SVM (Support Vector Machine) algorithm with unigram feature and feature selection algorithms of Information Gain and Minimum Frequency. Other than a pure statistical based method, we also employed the manually built emotion word list in the SVM based classification. In the text pre-processing, we compared several methods such as the normalization, emotion conversion, stop words removal, number removal, and a one-character token removal. The experimental results showed that the statistical based method result of 71.740% accuracy score is higher than the rule based method of 63.172% accuracy score. To enhance the accuracy, we employed SMOTE in order to handle the imbalanced data and achieved best result with the f-measure of 83.203%. In another experiment, we combined the pure statistical method with the rule based method by employing the manually word list into the classification features. The f-measure for this experiment has only reached 81.592%.
Keywords :
"Media","Twitter","Classification algorithms","Support vector machines","Machine learning algorithms","Feature extraction","Learning systems"
Conference_Titel :
Information Technology Systems and Innovation (ICITSI), 2015 International Conference on
Print_ISBN :
978-1-4673-6663-2
DOI :
10.1109/ICITSI.2015.7437692