مرکز منطقه ای اطلاع رساني علوم و فناوري - Towards efficient SimRank computation on large networks

DocumentCode :

610354

Title :

Towards efficient SimRank computation on large networks

Author :

Weiren Yu ; Xuemin Lin ; Wenjie Zhang

Author_Institution :

Univ. of New South Wales, Sydney, NSW, Australia

fYear :

2013

fDate :

8-12 April 2013

Firstpage :

601

Lastpage :

612

Abstract :

SimRank has been a powerful model for assessing the similarity of pairs of vertices in a graph. It is based on the concept that two vertices are similar if they are referenced by similar vertices. Due to its self-referentiality, fast SimRank computation on large graphs poses significant challenges. The state-of-the-art work [17] exploits partial sums memorization for computing SimRank in O(Kmn) time on a graph with n vertices and m edges, where K is the number of iterations. Partial sums memorizing can reduce repeated calculations by caching part of similarity summations for later reuse. However, we observe that computations among different partial sums may have duplicate redundancy. Besides, for a desired accuracy ϵ, the existing SimRank model requires K = [log_C ϵ] iterations [17], where C is a damping factor. Nevertheless, such a geometric rate of convergence is slow in practice if a high accuracy is desirable. In this paper, we address these gaps. (1) We propose an adaptive clustering strategy to eliminate partial sums redundancy (i.e., duplicate computations occurring in partial sums), and devise an efficient algorithm for speeding up the computation of SimRank to 0(Kd´n²) time, where d´ is typically much smaller than the average in-degree of a graph. (2) We also present a new notion of SimRank that is based on a differential equation and can be represented as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) Using real and synthetic data, we empirically verify that our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude, and that our revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores.

Keywords :

computational complexity; graph theory; iterative methods; matrix algebra; pattern clustering; SimRank computation; SimRank iteration; adaptive clustering strategy; differential equation; duplicate redundancy; geometric rate; graph; partial sums memorization; partial sums redundancy; similarity assessment; similarity summation caching; transition matrix; Accuracy; Clustering algorithms; Computational modeling; Convergence; Damping; Optimization; Redundancy;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering (ICDE), 2013 IEEE 29th International Conference on

Conference_Location :

Brisbane, QLD

ISSN :

1063-6382

Print_ISBN :

978-1-4673-4909-3

Electronic_ISBN :

1063-6382

Type :

conf

DOI :

10.1109/ICDE.2013.6544859

Filename :

6544859

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=610354