DocumentCode :
2958942
Title :
SAHAD: Subgraph Analysis in Massive Networks Using Hadoop
Author :
Zhao, Zhao ; Wang, Guanying ; Butt, Ali R. ; Khan, Maleq ; Kumar, V. S Anil ; Marathe, Madhav V.
Author_Institution :
Network Dynamics & Simulation Sci. Lab., Virginia Tech, Blacksburg, VA, USA
fYear :
2012
fDate :
21-25 May 2012
Firstpage :
390
Lastpage :
401
Abstract :
Relational sub graph analysis, e.g. finding labeled sub graphs in a network, which are isomorphic to a template, is a key problem in many graph related applications. It is computationally challenging for large networks and complex templates. In this paper, we develop SAHAD, an algorithm for relational sub graph analysis using Hadoop, in which the sub graph is in the form of a tree. SAHAD is able to solve a variety of problems closely related with sub graph isomorphism, including counting labeled/unlabeled sub graphs, finding supervised motifs, and computing graph let frequency distribution. We prove that the worst case work complexity for SAHAD is asymptotically very close to that of the best sequential algorithm. On a mid-size cluster with about 40 compute nodes, SAHAD scales to networks with up to 9 million nodes and a quarter billion edges, and templates with up to 12 nodes. To the best of our knowledge, SAHAD is the first such Hadoop based subgraph/subtree analysis algorithm, and performs significantly better than prior approaches for very large graphs and templates. Another unique aspect is that SAHAD is also amenable to running quite easily on Amazon EC2, without needs for any system level optimization.
Keywords :
computational complexity; distributed processing; trees (mathematics); Amazon EC2; SAHAD; graph let frequency distribution computing; labeled-unlabeled subgraph counting; mid-size cluster; relational subgraph analysis in massive networks using Hadoop; sequential algorithm; subgraph isomorphism; subgraph-subtree analysis algorithm; supervised motif finding; worst case work complexity; Algorithm design and analysis; Color; Complexity theory; Encoding; Heuristic algorithms; Image color analysis; Partitioning algorithms; Hadoop; MapReduce; frequent subgraph; graphlet frequency distribution; motif; subgraph isomorphism;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-4673-0975-2
Type :
conf
DOI :
10.1109/IPDPS.2012.44
Filename :
6267876
Link To Document :
بازگشت