DocumentCode
659591
Title
Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier
Author
Bingwei Liu ; Blasch, Erik ; Yu Chen ; Dan Shen ; Genshe Chen
Author_Institution
Intell. Fusion Technol., Inc., Germantown, MD, USA
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
99
Lastpage
104
Abstract
A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy. However, when the dataset is large, some algorithms might not scale up well. In this paper, we aim to evaluate the scalability of Naïve Bayes classifier (NBC) in large datasets. Instead of using a standard library (e.g., Mahout), we implemented NBC to achieve fine-grain control of the analysis procedure. A Big Data analyzing system is also design for this study. The result is encouraging in that the accuracy of NBC is improved and approaches 82% when the dataset size increases. We have demonstrated that NBC is able to scale up to analyze the sentiment of millions movie reviews with increasing throughput.
Keywords
Bayes methods; Big Data; data analysis; data mining; learning (artificial intelligence); pattern classification; text analysis; Big Data analysis; Mahout; NBC; dataset size; decision making; machine learning; movie reviews; naive Bayes classifier; opinion extraction; scalable sentiment classification; sentiment extraction; Accuracy; Data handling; Data storage systems; Information management; Mathematical model; Motion pictures; Training; Big data; Cloud computing; Polarity mining; sentiment classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691740
Filename
6691740
Link To Document