Title :
Pegasus: Mining billion-scale graphs in the cloud
Author :
Kang, U. ; Chau, Duen Horng ; Faloutsos, Christos
Author_Institution :
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
We have entered in an era of big data. Graphs are now measured in terabytes or even petabytes; analyzing them has become increasingly challenging. How do we find patterns and anomalies in these graphs that no longer fit in memory? How should we exploit parallel computation to boost our analysis capabilities? We present PEGASUS, the first open-source, peta-scale graph mining library, for the HADOOP platform (open-source implementation of MAPREDUCE). By observing that many graph mining operations can be described by repeated matrix-vector multiplications, we devised an important primitive called GIM-V for PEGASUS that applies to all such operations. GIM-V (Generalized Iterative Matrix-Vector multiplication) is highly optimized, achieving (1) good scale-up with the number of machines, (2) linear run time on the number of edges, and (3) more than 9 times faster performance over the non-optimized version. We ran experiments for PEGASUS on M45, one of the largest HADOOP clusters in the world. We report our findings on several real graphs with billions of nodes and edges. Selected findings include (a) the discovery of adult advertisers in the who-follows-whom on Twitter, and (b) the 7-degrees of separation in the Web graph.
Keywords :
cloud computing; data mining; iterative methods; matrix multiplication; public domain software; social networking (online); vectors; GIM-V; HADOOP cluster platform; MAPREDUCE; Pegasus; Twitter; Web graph; billion-scale graph mining; cloud computing; generalized iterative matrix-vector multiplication; matrix-vector multiplications; open-source peta-scale graph mining library; Algorithm design and analysis; Belief propagation; Data mining; Sparse matrices; Symmetric matrices; Twitter; HADOOP; PEGASUS; graph mining;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6289127