Title :
Extracting Two-Noun Phrases from Customer Reviews
Author :
Wang, Hui ; Chen, Jiansheng
Author_Institution :
Coll. of Comput. Sci. & Inf. Eng., Tianjin Univ. of Sci. & Technol. Tianjin, Tianjin, China
Abstract :
The Web contains a huge amount of information in its unstructured texts. Analyzing these texts is very important as more and more people post product reviews at merchant sites, discussion groups, etc. This paper presents a set of language patterns, which is composed of 22 rules, to extract two-noun phrases from customer reviews. Two-noun phrases are specific and interesting when compared with one-noun words. Normally, these phrases contain product features which are useful for customers. Three tagging methods are used to generate partof-speech tags for Bing Liu´s dataset. On average, the recall of each tagging method is above 90 percent no matter what the tagging method is. With this set of rules in hand, we can keep 22 percent or more two-noun phrases from being extracted in each category, which are useless and do not need to be extracted. Additionally, language rules can be used to extract some useful product features that human taggers fail to annotate.
Keywords :
Internet; electronic commerce; identification technology; natural language processing; text analysis; Bing Liu´s dataset; World Wide Web; customer reviews; e-commerce; language patterns; language rules; partof-speech tags; post product reviews; tagging methods; two-noun phrase extraction; unstructured text analysis; Computer science; Data mining; Educational institutions; Electronic mail; Humans; Itemsets; Search engines; Speech; Tagging; Writing;
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
DOI :
10.1109/CISE.2009.5366577