Title :
Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm
Author :
Phakhawat Sarakit;Thanaruk Theeramunkong;Choochart Haruechaiyasak
Author_Institution :
School of Information, Computer and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Thailand
Abstract :
The imbalanced dataset problem triggers degradation of classification performance in several data mining applications including pattern recognition, text categorization, and information filtering tasks. To improve emotion classification performance, we use a sampling-based algorithm called SMOTE, which oversamples instances in a minority class to the number of those from the majority class. YouTube dataset was balanced using the SMOTE technique and tested using three machine learning algorithms, namely multinomial Naïve Bayes (MNB), decision tree (DT) and support vector machines (SVM). As a result, SVM achieves the highest accuracy with 93.30% on filtering task and 89.44% on classification. The SMOTE technique can solve the imbalanced data problem and obtain an improved classification result.
Keywords :
"Support vector machines","Accuracy","Filtering","Classification algorithms","Decision trees","YouTube","Machine learning algorithms"
Conference_Titel :
Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 2015 2nd International Conference on
Print_ISBN :
978-1-4673-8142-0
DOI :
10.1109/ICAICTA.2015.7335373