Abstract :
Micro logging websites, like Twitter, as a new social media form are growing increasingly popular. Compared with the traditional medias, such as New York Times, tweets are structured data form and with shorter length. Although traditional topic modeling algorithms have been studied well, few algorithms are specially designed to mine Twitter data according to its own features. In this paper, according to the structure of Twitter data, we introduce Multi Topic Distribution Model to mine topics. In addition, we have observed that one tweet mostly discusses either public issues or personal lives. Former studies equally analyze all tweets and fail to discover interests of each individual. With the help of features of Twitter data, dividing topics into two types in semantics, our model not only efficiently discover topics, but also is able to indicate which topics are interested by an user and which topics are hot issues of the Twitter community. Through Gibbs sampling for approximate inference, the experiments are conducted in the TREC2011 data set. Experimental results on the data set have shown an comparison between our model and Latent Dirichlet Allocation, Author Topic Model. We also illustrate an example of topics which are interested by the whole community and several users.
Keywords :
data structures; inference mechanisms; social networking (online); Gibbs sampling; New York Times; TREC2011 data set; Twitter community; Twitter data; approximate inference; author topic model; latent Dirichlet allocation; micrologging Web sites; mine topics; multitopic distribution model; social media; structured data form; topic discovery; topic modeling algorithms; Communities; Computational modeling; Data models; Equations; Mathematical model; Semantics; Twitter; Graphic Models; Topic Discovery; Twitter;