• DocumentCode
    257194
  • Title

    A probabilistic approach towards modeling email network with realistic features

  • Author

    Quangang Li ; Jinqiao Shi ; Tingwen Liu ; Li Guo ; Zhiguang Qin

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
  • fYear
    2014
  • fDate
    4-7 Aug. 2014
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Email plays a very important role in our daily life. Much work have been put into practice on email network. Those studies mostly require real email network datasets and reliable models to analyze user information and understand the mechanisms of network evolution. However, much research work is constrained by the absence of real large-scale email datasets. Although email communication is ubiquitous, there are very few large-scale available email datasets satisfied different research purposes. Due to privacy policy and restricted permissions, it is arduous to collect a real large-scale email dataset in a short time. Various social network models are usually used to create synthetic email networks. However, these models focus on modeling several structural properties of network without considering user behaviour patterns. They are not appropriate to generate large-scale realistic synthetic email network datasets. Towards this end, we propose a probabilistic model by which we can construct large-scale synthetic email datasets with a small captured email log. What is more important is that the generated synthetic dataset matches real email network properties and individual communication patterns. Moreover, it has linear complexity, and can be paralleled easily. Experimental results on Enron dataset demonstrate the above benefits of our model.
  • Keywords
    computational complexity; data privacy; electronic mail; probability; social networking (online); Enron dataset; captured email log; email communication; email network property; individual communication pattern; large-scale email dataset; large-scale realistic synthetic email network dataset; linear complexity; network evolution; privacy policy; probabilistic approach; probabilistic model; realistic features; social network model; structural property; user behaviour pattern; user information; Analytical models; Communities; Complexity theory; Computational modeling; Electronic mail; Social network services; Training; Dirichlet; Email network; generative model; simulation; snapshot;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Communication and Networks (ICCCN), 2014 23rd International Conference on
  • Conference_Location
    Shanghai
  • Type

    conf

  • DOI
    10.1109/ICCCN.2014.6911760
  • Filename
    6911760