Title :
Learning Stochastic Models of Information Flow
Author :
Dickens, Luke ; Molloy, Ian ; Lobo, Jorge ; Cheng, Pau-Chen ; Russo, Alessandra
Author_Institution :
Imperial Coll. Londonc, London, UK
Abstract :
An understanding of information flow has many applications, including for maximizing marketing impact on social media, limiting malware propagation, and managing undesired disclosure of sensitive information. This paper presents scalable methods for both learning models of information flow in networks from data, based on the Independent Cascade Model, and predicting probabilities of unseen flow from these models. Our approach is based on a principled probabilistic construction and results compare favourably with existing methods in terms of accuracy of prediction and scalable evaluation, with the addition that we are able to evaluate a broader range of queries than previously shown, including probability of joint and/or conditional flow, as well as reflecting model uncertainty. Exact evaluation of flow probabilities is exponential in the number of edges and naive sampling can also be expensive, so we propose sampling in an efficient Markov-Chain Monte-Carlo fashion using the Metropolis-Hastings algorithm -- details described in the paper. We identify two types of data, those where the paths of past flows are known -- attributed data, and those where only the endpoints are known -- unattributed data. Both data types are addressed in this paper, including training methods, example real world data sets, and experimental evaluation. In particular, we investigate flow data from the Twitter microblogging service, exploring the flow of messages through retweets (tweet forwards) for the attributed case, and the propagation of hash tags (metadata tags) and urls for the unattributed case.
Keywords :
Markov processes; Monte Carlo methods; data flow analysis; directed graphs; learning (artificial intelligence); meta data; probability; sampling methods; security of data; social networking (online); Markov-Chain Monte-Carlo methoc; Metropolis-Hastings algorithm; Twitter microblogging service; URL; conditional flow probability; data type identification; directed graph; hash tag propagation; independent cascade model; information flow; joint flow probability; malware propagation; marketing impact maximization; message flow; metadata tags; model uncertainty; naive sampling; principled probabilistic construction; retweet; sensitive information undesired disclosure management; social media; stochastic model learning; tweet forward; unattributed data; unseen flow probability; Accuracy; Data models; Equations; Joints; Predictive models; Twitter; Uncertainty;
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4673-0042-1
DOI :
10.1109/ICDE.2012.103