DocumentCode :
1759081
Title :
HEigen: Spectral Analysis for Billion-Scale Graphs
Author :
Kang, U. ; Meeder, Brendan ; Papalexakis, Evangelos E. ; Faloutsos, Christos
Author_Institution :
Comput. Sci. Dept., KAIST, Daejeon, South Korea
Volume :
26
Issue :
2
fYear :
2014
fDate :
Feb. 2014
Firstpage :
350
Lastpage :
362
Abstract :
Given a graph with billions of nodes and edges, how can we find patterns and anomalies? Are there nodes that participate in too many or too few triangles? Are there close-knit near-cliques? These questions are expensive to answer unless we have the first several eigenvalues and eigenvectors of the graph adjacency matrix. However, eigensolvers suffer from subtle problems (e.g., convergence) for large sparse matrices, let alone for billion-scale ones. We address this problem with the proposed HEIGEN algorithm, which we carefully design to be accurate, efficient, and able to run on the highly scalable MAPREDUCE (HADOOP) environment. This enables HEIGEN to handle matrices more than 1;000 × larger than those which can be analyzed by existing algorithms. We implement HEIGEN and run it on the M45 cluster, one of the top 50 supercomputers in the world. We report important discoveries about nearcliques and triangles on several real-world graphs, including a snapshot of the Twitter social network (56 Gb, 2 billion edges) and the “YahooWeb” data set, one of the largest publicly available graphs (120 Gb, 1.4 billion nodes, 6.6 billion edges).
Keywords :
data mining; graph theory; parallel machines; pattern recognition; spectral analysis; HADOOP platform; HEigen; M45 cluster; Twitter social network; YahooWeb data set; billion-scale graphs; eigensolver; graph mining; matrices; open-source MapReduce framework; pattern discovery; publicly available graphs; real-world graphs; spectral analysis; supercomputers; Algorithm design and analysis; Eigenvalues and eigenfunctions; LinkedIn; Sparse matrices; Symmetric matrices; Twitter; Web pages; HEigen; Hadoop; MapReduce; Spectral analysis; graph mining;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2012.244
Filename :
6384533
Link To Document :
بازگشت