DocumentCode
714277
Title
Traversal optimizations and analysis for large graph clustering
Author
Nawaz, Waqas ; Young-Koo Lee
Author_Institution
Dept. of Comput. Eng., Kyung Hee Univ., Yongin, South Korea
fYear
2015
fDate
13-17 April 2015
Firstpage
260
Lastpage
264
Abstract
Graph is an extremely versatile data structure in terms of its expressiveness and flexibility to model a range of real life phenomenon, such as social, biological, sensor, and computer networks. Finding groups of vertices based on their similarity is the fundamental graph mining task to get useful insights. The existing methods suffer from scalability issues due to enormous computations of an exact similarity estimation. Therefore, we introduce Collaborative Similarity Measure (CSM) based on shortest path strategy, instead of all paths, to define structural and semantic relevance among vertices efficiently. We evaluate this measure for personalized email community detection as an application scenario. However, an abundance of structural information has resulted in non-trivial graph traversals. Shortcut construction is among the utilized techniques implemented for efficient shortest path (SP) traversals on graphs. The shortcut construction, being a computationally intensive task, required to be exclusive and offline, often produces unnecessary auxiliary data. To overcome this issue, we present Shortest Path Overlapped Region (SPORE), a performance-based initiative that improves the shortcut construction performance by exploiting SP overlapped regions. Path overlapping with empirical analysis has been overlooked by shortcut construction systems. SPORE avails this opportunity and provides a solution by constructing auxiliary shortcuts incrementally, using SP trees during traversals, instead of an exclusive step. SPORE is exposed to a graph clustering task, which requires extensive graph traversals to group similar vertices together, for realistic implications. We further suggest an optimization strategy to accelerate the performance of the clustering process using confined subgraph traversals. Leveraging the SPORE with multiple SP computations consistently reduces the latency of the entire clustering process. A parameter-free graph clustering with scalable graph trave- sal strategy for a billion scale graph remain an open issue.
Keywords
data mining; electronic mail; graph theory; optimisation; pattern clustering; CSM; SP trees; SPORE; collaborative similarity measure; confined subgraph traversals; graph mining task; nontrivial graph traversals; parameter-free graph clustering; personalized e-mail community detection; shortest path graph traversals; shortest path overlapped region; traversal optimizations; versatile data structure; Clustering algorithms; Collaboration; Communities; Data mining; Electronic mail; Semantics; Social network services;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering Workshops (ICDEW), 2015 31st IEEE International Conference on
Conference_Location
Seoul
Type
conf
DOI
10.1109/ICDEW.2015.7129587
Filename
7129587
Link To Document