DocumentCode :
3687110
Title :
Improving the performance of graph analysis through partitioning with sampling
Author :
Michael M. Wolf;Benjamin A. Millery
Author_Institution :
Sandia National Laboratories, Albuquerque, NM 87185, United States
fYear :
2015
Firstpage :
1
Lastpage :
6
Abstract :
Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. Eigenspace analysis of large-scale graphs is useful for dimensionality reduction of these large, noisy data sets into a more tractable analysis problem. When performing this sort of analysis across many parallel processes, the data partitioning scheme may have a significant impact on the overall running time. Previous work demonstrated that partitioning based on a sampled subset of edges still yields a substantial improvement in running time. In this work, we study this further, exploring how different sampling strategies, graph community structure, and the vertex degree distribution affect the partitioning quality. We show that sampling is an effective technique when partitioning for data analytics problems with community-like structure.
Keywords :
"Sparse matrices","Stochastic processes","Laboratories","Runtime","Kernel","Generators","Signal processing"
Publisher :
ieee
Conference_Titel :
High Performance Extreme Computing Conference (HPEC), 2015 IEEE
Type :
conf
DOI :
10.1109/HPEC.2015.7322457
Filename :
7322457
Link To Document :
بازگشت