Title :
Implementing quasi-parallel breadth-first search in MapReduce for large-scale social network mining
Author :
Lianghong Qian ; Lei Fan ; Jianhua Li
Author_Institution :
Sch. of Inf. Security Eng., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
Online social networks like Weibo and Twitter consist of billions of users and connections, and traditional approaches which are based on serial algorithms and leveraged only a single node or even a single core cannot suffice the that scale of data any more. We propose new distributed quasi-parallel breadth-first search scheme, the common graph traversal algorithm, based on the MapReduce framework, which has better performance (up to one scale of magnitude less time complexity for single-source cases or even better for multiple-source cases) than Pegasus, the state-of-the-art graph mining library, in terms of the complexity of computation and the I/O load. We apply our algorithms on the Weibo dataset, crawled from its website, which contains 135 million users and 10.2 billion directed connections among them, and occupies up to 400 gigabytes. The dataset is by far the largest one of online social networks in research. Based on the Weibo dataset with extremely skewed degree distribution, we give the empirical time complexity and I/O load analysis in each iteration of our proposed methods. Also, We ran the experiments on a 20-node Hadoop cluster to validate our analysis, and the results conform to our predicted empirical results.
Keywords :
computational complexity; data mining; graph theory; libraries; parallel algorithms; social networking (online); tree searching; Hadoop cluster; I-O load analysis; MapReduce framework; Pegasus; Twitter; Website; Weibo dataset; computational complexity; distributed quasi-parallel breadth-first search scheme; empirical time complexity; graph mining library; graph traversal algorithm; large-scale social network mining; serial algorithms; skewed degree distribution; Educational institutions; Time measurement; MapReduce; breadth-first search; graph mining; online social networks;
Conference_Titel :
Computational Aspects of Social Networks (CASoN), 2013 Fifth International Conference on
Conference_Location :
Fargo, ND
Print_ISBN :
978-1-4799-1407-4
DOI :
10.1109/CASoN.2013.6622606