DocumentCode :
3756828
Title :
Utilizing Ensemble, Data Sampling and Feature Selection Techniques for Improving Classification Performance on Tweet Sentiment Data
Author :
Joseph Prusa;Taghi M. Khoshgoftaar;Amri Napolitano
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2015
Firstpage :
535
Lastpage :
542
Abstract :
Sentiment analysis of tweets is a popular method of opinion mining social media. Many machine learning techniques exist that can improve the performance of classifiers trained to determine the sentiment or emotional polarity of a tweet, however, they are designed with different objectives and it is unclear which techniques are most beneficial. Additionally, these techniques may behave differently depending on quality of data issues, such as class imbalance, a common problem when using real world data. In an effort to determine which techniques are more important, we tested 12 techniques consisting of: eight feature selection techniques, bagging, boosting and data sampling with two post sampling class ratios. Using five base learners, we compare these techniques against each other and each base learners with no additional technique. We train and test each classifier on a balanced dataset and two imbalanced datasets with different class ratios. Additionally, we conduct statistical tests to determine if the differences observed between techniques are significant. Our results show that bagging and seven of the eight feature selection techniques significantly improve performance (compared to using no technique) on all three datasets, while boosting and data sampling are less beneficial for imbalanced tweet sentiment data. To the best of our knowledge, this is the first study comparing these three types of techniques on tweet sentiment data and the first to show that feature selection and ensemble techniques perform better than data sampling on tweet sentiment data.
Keywords :
"Boosting","Bagging","Training data","Robustness","Support vector machines","Training","Data mining"
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type :
conf
DOI :
10.1109/ICMLA.2015.21
Filename :
7424371
Link To Document :
بازگشت