DocumentCode :
3673630
Title :
Using Random Undersampling to Alleviate Class Imbalance on Tweet Sentiment Data
Author :
Joseph Prusa;Taghi M. Khoshgoftaar;David J. Dittman;Amri Napolitano
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2015
Firstpage :
197
Lastpage :
202
Abstract :
Sentiment classification of tweets is used for a variety of social sensing tasks and provides a means of discerning public opinion on a wide range of topics. A potential concern when performing sentiment classification is that the training data may contain class imbalance, which can negatively affect classification performance. A classifier trained on imbalanced data may be biased in favor of the majority class. One possibile method of addressing this is to use data sampling to achieve a more balanced class distribution. In this work, we seek to observe how data sampling (using random undersampling with either a 50:50 or 35:65 positive:negative post-sampling class distribution ratio) affects the classification performance on tweet sentiment data. Our experimental results show that Random Undersampling significantly improves classification performance in comparison to not using any data sampling. Furthermore, there is no significant difference between selecting a 50:50 or 35:65 post-sampling class distribution ratio.
Keywords :
"Feature extraction","Support vector machines","Training","Data mining","Training data","Analysis of variance","Machine learning algorithms"
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/IRI.2015.39
Filename :
7300975
Link To Document :
بازگشت