• DocumentCode
    608071
  • Title

    PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce

  • Author

    Weizhong Zhao ; Martha, V. ; Xiaowei Xu

  • Author_Institution
    Coll. of Inf. Eng., Xiangtan Univ., Xiangtan, China
  • fYear
    2013
  • fDate
    25-28 March 2013
  • Firstpage
    862
  • Lastpage
    869
  • Abstract
    Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detection of clusters or community structures in big networks such as Twitter. PSCAN is based on the structural clustering algorithm of SCAN, which not only finds cluster accurately, but also identifies vertices playing special roles such as hubs and outliers. An empirical evaluation of PSCAN using both real and synthetic networks demonstrated an outstanding performance in terms of accuracy and running time. We analyzed a Twitter network with over 40 million users and 1.4 billion follower/following relationships by using PSCAN on a Hadoop cluster with 15 computers. The result shows that PSCAN successfully detected interesting communities of people who share common interests.
  • Keywords
    parallel algorithms; parallel programming; pattern clustering; social networking (online); Hadoop cluster; MapReduce programming model; PSCAN algorithm; Twitter; big network; parallel structural clustering algorithm; Accuracy; Benchmark testing; Clustering algorithms; Communities; Computers; Partitioning algorithms; Twitter; Hadoop; MapReduce; Network clustering algorithms; big data; community structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1550-445X
  • Print_ISBN
    978-1-4673-5550-6
  • Electronic_ISBN
    1550-445X
  • Type

    conf

  • DOI
    10.1109/AINA.2013.47
  • Filename
    6531844