DocumentCode :
1791646
Title :
Identifying top Chinese network buzzwords from social media big data set based on time-distribution features
Author :
Yongli Tang ; Tingting He ; Bo Li ; Xiaohua Hu
Author_Institution :
Sch. of Comput., Central China Normal Univ., Wuhan, China
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
924
Lastpage :
931
Abstract :
Buzzwords are the main embodiment of Internet culture, which play an important role in public opinion analysis, social focus tracking and language evolution study. At present, questionnaire has been wildly used as a standard method to obtain network buzzwords, which is subjective and costly. In this paper, we will propose a novel algorithm relying on the time-distribution feature of words and a KL-divergence measure to estimate words´ popularity so as to figure out buzzwords in a specific period. The time-distribution feature simply states the fact that buzzwords´ usage has a sharp increase during a very short period, which is then modeled formally with the KL-divergence measure. Compared with traditional method involving much workforce, the automatic algorithm presented here is clearly more efficient. Moreover, buzzwords identified in this manner will not be affected by individual´s subjective opinions, so they can reflect the language usage in practice better. When applying the algorithm to a social media big data set, our experimental results show that the proposed approach can accurately identify buzzwords in a certain period, which is highly coincident with results tagged manually.
Keywords :
Big Data; social networking (online); text analysis; Internet culture; KL-divergence measure; buzzword usage; language evolution study; language usage; public opinion analysis; social focus tracking; social media Big Data set; standard method; subjective opinions; time-distribution feature; time-distribution features; time-distribution word feature; top-Chinese network buzzword Identification; word popularity estimation; Data models; Educational institutions; Information retrieval; Internet; Mathematical model; Probability distribution; Smoothing methods; KL divergence; buzzword; language model; time-distribution;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004324
Filename :
7004324
Link To Document :
بازگشت