Title :
Using Naïve Bayes Classifier to Distinguish Reviews from Non-review Documents in Chinese
Author :
Zi-qiong, ZHANG ; Qiang, Ye ; Yi-jun, Li
Author_Institution :
Harbin Inst. of Technol., Harbin
Abstract :
Reviews are subjective documents expressing opinions or evaluations. In contrast, non-review documents often present factual information objectively. Separating reviews from non-reviews, or subjectivity classification, is potentially important for many text processing applications, such as information extraction and information retrieval. Also, it is a key process in sentiment classification for online customer reviews. As a type of genre classification, the classifications of subjective and objective texts are different from traditional topic-based classifications. Not many studies have been conducted in this domain and most of them were on English texts. Little work has been done on Chinese subjectivity classification. However, the detailed techniques used in English texts can not be applied directly to Chinese due to the different characteristics between these two languages. This paper proposes an approach to perform subjectivity classification on Chinese text based on a supervised machine learning algorithm, Naive Bayes. Experiment studies have been conducted on two kinds of documents: movie reviews and movie plots written in Chinese. The results show that the performances of the proposed approach are comparable to those of the existing English subjectivity classification studies.
Keywords :
Bayes methods; information retrieval; learning (artificial intelligence); natural language processing; text analysis; Chinese subjectivity classification; English text processing; document expressing opinion; genre classification; information extraction; information retrieval; naive Bayes classifier; online customer review; sentiment classification; supervised machine learning algorithm; Conference management; Data mining; Engineering management; Information retrieval; Motion pictures; Natural languages; Prototypes; Search engines; Technology management; Text processing; Chinese; movie review; sentiment classification; subjectivity classification;
Conference_Titel :
Management Science and Engineering, 2007. ICMSE 2007. International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-7-88358-080-5
Electronic_ISBN :
978-7-88358-080-5
DOI :
10.1109/ICMSE.2007.4421834