DocumentCode :
3734078
Title :
Sensibility estimation method for youth slang by using sensibility co-occurrence feature vector obtained from microblog
Author :
Kazuyuki Matsumoto;Minora Yoshida;Kenji Kita
Author_Institution :
Faculty of Engineering, Tokushima University, Tokushima city, Japan
fYear :
2015
Firstpage :
473
Lastpage :
478
Abstract :
Social networking sites such as Twitter provide more opportunities to express what people think or intend in short text. In short text, abbreviations such as "ASAP" or "joinus" and emoticons are often used. Because these expressions are not registered into the existing dictionaries, these are analyzed as unknown expressions. That can be a bottleneck for improving accuracy of reputation analysis in text mining. To use context for unknown word clustering is a major method, however, it usually requires word segmentation process and it has weakness for split errors of unknown expressions such as youth slang. In this paper, we proposed a method to obtain the appropriate context even though unknown expressions cause split errors and estimate sensibility expressed in the text. Because the dimensions of the obtained context vector were enormous, we also proposed a method to create a feature vector based on the co-occurrence of the sensibility words as simple expression with low dimension. As an evaluation experiment, the proposed method showed certain accuracy even with the small training data.
Keywords :
"Dictionaries","Twitter","Context","Estimation","Feature extraction","Training data","Thesauri"
Publisher :
ieee
Conference_Titel :
Computer and Communications (ICCC), 2015 IEEE International Conference on
Print_ISBN :
978-1-4673-8125-3
Type :
conf
DOI :
10.1109/CompComm.2015.7387618
Filename :
7387618
Link To Document :
بازگشت