Title :
Optimized Swarm Search-Based Feature Selection for Text Mining in Sentiment Analysis
Author :
Simon Fong;Elisa Gao;Raymond Wong
Author_Institution :
Dept. of Comput. &
Abstract :
Sentiment analysis emerged as an important computational domain to gain insights from snippets of texts, as social media recently gained popularity. Text mining has long been a fundamental data analytic for sentiment analysis. One of the popular preprocessing approaches in text mining is transforming text strings to word vectors which form a high-dimensional sparse matrix. This sparse matrix poses challenges to induction of an accurate sentiment classification model. Feature selection is usually applied for finding a subset of features from all the original features from the sparse matrix, in order to enhance the accuracy of the classification model. In this paper, a new feature selection method called Optimized Swarm Search-based Feature Selection (OS-FS) is proposed. OS-FS is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. The swarm search in OS-FS is optimized by a new feature evaluation technique called Clustering-by-Coefficient-of-Variation (CCV). The proposed scheme is verified via a mood classification scenario where 100 sample news are extracted from CNN.com. One of six human emotions (or sentiments) would have to be recognized from the news contents, by computer using text mining. The results show superiority of OS-FS over the traditional feature selection methods.
Keywords :
"Sparse matrices","Sentiment analysis","Text mining","Correlation","Media","Training"
Conference_Titel :
Data Mining Workshop (ICDMW), 2015 IEEE International Conference on
Electronic_ISBN :
2375-9259
DOI :
10.1109/ICDMW.2015.231