DocumentCode
2711049
Title
Document-Word Co-regularization for Semi-supervised Sentiment Analysis
Author
Sindhwani, Vikas ; Melville, Prem
Author_Institution
IBM T. J. Watson Res. Center, Yorktown Heights, NY
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
1025
Lastpage
1030
Abstract
The goal of sentiment prediction is to automatically identify whether a given piece of text expresses positive or negative opinion towards a topic of interest. One can pose sentiment prediction as a standard text categorization problem, but gathering labeled data turns out to be a bottleneck. Fortunately, background knowledge is often available in the form of prior information about the sentiment polarity of words in a lexicon. Moreover, in many applications abundant unlabeled data is also available. In this paper, we propose a novel semi-supervised sentiment prediction algorithm that utilizes lexical prior knowledge in conjunction with unlabeled examples. Our method is based on joint sentiment analysis of documents and words based on a bipartite graph representation of the data. We present an empirical study on a diverse collection of sentiment prediction problems which confirms that our semi-supervised lexical models significantly outperform purely supervised and competing semi-supervised techniques.
Keywords
graph theory; least squares approximations; text analysis; word processing; bipartite graph representation; document joint sentiment analysis; document-word co-regularization; prediction algorithm; semi supervised sentiment prediction algorithm; standard regularized least square; text categorization problem; Blogs; Data mining; Discussion forums; Frequency; Machine learning; Motion pictures; Prediction algorithms; Text analysis; Text categorization; Vectors; Graph Transduction; Linear models; Semi-supervised Learning; Sentiment Analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location
Pisa
ISSN
1550-4786
Print_ISBN
978-0-7695-3502-9
Type
conf
DOI
10.1109/ICDM.2008.113
Filename
4781219
Link To Document