Author_Institution :
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA, USA
Abstract :
Sentiment analysis studies the public opinions towards an entity, and it is an important research area in data mining. Recently, a lot of sentiment analysis models have been proposed, including supervised and unsupervised approaches. However, the role of supervised models has been undermined by the phenomenon of big data, and the unsupervised ones are drawing more and more attention. But, most current unsupervised methods are based on Latent Dirichlet Allocation (LDA), and they need to specify the number of aspects in advance, making them subjective. In addition, these methods treat factual words and opinioned words the same, and assume that one sentence contains only one aspect, all of which make the existing unsupervised methods unsatisfactory. To solve these problems, this paper proposes a novel hybrid Hierarchical Dirichlet Process-Latent Dirichlet Allocation (HDP-LDA) model. This model can automatically determine the number of aspects, distinguish factual words from opinioned words, and further effectively extracts the aspect specific sentiment words. Experiment result shows that our model can clearly capture the aspects people mentioned and the specific sentiment words they use in each aspect, improving the performance of sentiment analysis efficiently. At last, we compared our model with the influential topic models, namely, JST, AUSM and Maxine-LDA, on the online restaurant review, and found our model performs very well.
Keywords :
Big Data; data mining; human factors; statistical analysis; unsupervised learning; AUSM; JST; MaxEnt-LDA; big data phenomenon; data mining; factual words; hybrid HDP-LDA model; hybrid hierarchical Dirichlet process-latent Dirichlet allocation model; influential topic models; online restaurant review; opinioned words; public opinions; sentiment analysis models; supervised models; unsupervised approach; Adaptation models; Analytical models; Context; Dictionaries; Educational institutions; Probabilistic logic; Resource management; Aspect Detection; Hierarchical Dirichlet Process; Latent Dirichlet Allocation; Probabilistic Model; Sentiment Analysis;