Title :
Identifying top Chinese network buzzwords from social media big data set based on time-distribution features
Author :
Yongli Tang ; Tingting He ; Bo Li ; Xiaohua Hu
Author_Institution :
Sch. of Comput., Central China Normal Univ., Wuhan, China
Abstract :
Buzzwords are the main embodiment of Internet culture, which play an important role in public opinion analysis, social focus tracking and language evolution study. At present, questionnaire has been wildly used as a standard method to obtain network buzzwords, which is subjective and costly. In this paper, we will propose a novel algorithm relying on the time-distribution feature of words and a KL-divergence measure to estimate words´ popularity so as to figure out buzzwords in a specific period. The time-distribution feature simply states the fact that buzzwords´ usage has a sharp increase during a very short period, which is then modeled formally with the KL-divergence measure. Compared with traditional method involving much workforce, the automatic algorithm presented here is clearly more efficient. Moreover, buzzwords identified in this manner will not be affected by individual´s subjective opinions, so they can reflect the language usage in practice better. When applying the algorithm to a social media big data set, our experimental results show that the proposed approach can accurately identify buzzwords in a certain period, which is highly coincident with results tagged manually.
Keywords :
Big Data; social networking (online); text analysis; Internet culture; KL-divergence measure; buzzword usage; language evolution study; language usage; public opinion analysis; social focus tracking; social media Big Data set; standard method; subjective opinions; time-distribution feature; time-distribution features; time-distribution word feature; top-Chinese network buzzword Identification; word popularity estimation; Data models; Educational institutions; Information retrieval; Internet; Mathematical model; Probability distribution; Smoothing methods; KL divergence; buzzword; language model; time-distribution;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004324