DocumentCode
608071
Title
PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce
Author
Weizhong Zhao ; Martha, V. ; Xiaowei Xu
Author_Institution
Coll. of Inf. Eng., Xiangtan Univ., Xiangtan, China
fYear
2013
fDate
25-28 March 2013
Firstpage
862
Lastpage
869
Abstract
Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detection of clusters or community structures in big networks such as Twitter. PSCAN is based on the structural clustering algorithm of SCAN, which not only finds cluster accurately, but also identifies vertices playing special roles such as hubs and outliers. An empirical evaluation of PSCAN using both real and synthetic networks demonstrated an outstanding performance in terms of accuracy and running time. We analyzed a Twitter network with over 40 million users and 1.4 billion follower/following relationships by using PSCAN on a Hadoop cluster with 15 computers. The result shows that PSCAN successfully detected interesting communities of people who share common interests.
Keywords
parallel algorithms; parallel programming; pattern clustering; social networking (online); Hadoop cluster; MapReduce programming model; PSCAN algorithm; Twitter; big network; parallel structural clustering algorithm; Accuracy; Benchmark testing; Clustering algorithms; Communities; Computers; Partitioning algorithms; Twitter; Hadoop; MapReduce; Network clustering algorithms; big data; community structures;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on
Conference_Location
Barcelona
ISSN
1550-445X
Print_ISBN
978-1-4673-5550-6
Electronic_ISBN
1550-445X
Type
conf
DOI
10.1109/AINA.2013.47
Filename
6531844
Link To Document