Title :
Multilingual Sentiment Analysis Using Latent Semantic Indexing and Machine Learning
Author :
Bader, Brett W. ; Kegelmeyer, W. Philip ; Chew, Peter A.
Author_Institution :
Sandia Nat. Labs., Albuquerque, NM, USA
Abstract :
We present a novel approach to predicting the sentiment of documents in multiple languages, without translation. The only prerequisite is a multilingual parallel corpus wherein a training sample of the documents, in a single language only, have been tagged with their overall sentiment. Latent Semantic Indexing (LSI) converts that multilingual corpus into a multilingual ``concept space´´. Both training and test documents can be projected into that space, allowing cross-lingual semantic comparisons between the documents without the need for translation. Accordingly, the training documents with known sentiment are used to build a machine learning model which can, because of the multilingual nature of the document projections, be used to predict sentiment in the other languages. We explain and evaluate the accuracy of this approach. We also design and conduct experiments to investigate the extent to which topic and sentiment separately contribute to that classification accuracy, and thereby shed some initial light on the question of whether topic and sentiment can be sensibly teased apart.
Keywords :
document handling; indexing; learning (artificial intelligence); natural language processing; document translation; latent semantic indexing; machine learning model; multilingual concept space; multilingual parallel corpus; multilingual sentiment analysis; Accuracy; Large scale integration; Machine learning; Predictive models; Semantics; Training; Vectors; Sentiment analysis; latent semantic analysis; machine learning; multilingual; parallel corpora;
Conference_Titel :
Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4673-0005-6
DOI :
10.1109/ICDMW.2011.185