• DocumentCode
    2333500
  • Title

    Walking in Facebook: A Case Study of Unbiased Sampling of OSNs

  • Author

    Gjoka, Minas ; Kurant, Maciej ; Butts, Carter T. ; Markopoulou, Athina

  • Author_Institution
    Networked Syst., UC Irvine, Irvine, CA, USA
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    1
  • Lastpage
    9
  • Abstract
    With more than 250 million active users, Facebook (FB) is currently one of the most important online social networks. Our goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph. In this quest, we consider and implement several candidate techniques. Two approaches that are found to perform well are the Metropolis-Hasting random walk (MHRW) and a re-weighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the "ground-truth" (UNI - obtained through true uniform sampling of FB userIDs). In contrast, the traditional Breadth-First-Search (BFS) and Random Walk (RW) perform quite poorly, producing substantially biased results. In addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these can be used to effectively determine when a random walk sample is of adequate size and quality for subsequent use (i.e., when it is safe to cease sampling). Using these methods, we collect the first, to the best of our knowledge, unbiased sample of Facebook. Finally, we use one of our representative datasets, collected through MHRW, to characterize several key properties of Facebook.
  • Keywords
    convergence; data mining; search problems; social networking (online); BFS; Facebook; Metropolis-Hasting random walk; breadth-first-search; crawling; data collection; ground truth; online formal convergence diagnostics; online social network; re-weighted random walk; Character generation; Communications Society; Convergence; Facebook; Internet; Legged locomotion; Sampling methods; Social network services; Sociology; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2010 Proceedings IEEE
  • Conference_Location
    San Diego, CA
  • ISSN
    0743-166X
  • Print_ISBN
    978-1-4244-5836-3
  • Type

    conf

  • DOI
    10.1109/INFCOM.2010.5462078
  • Filename
    5462078