Title :
Top-K aggregation over a large graph using shared-nothing systems
Author :
Chakraborty, Arpan
Author_Institution :
Sch. of Inf. & Comput. Sci., Indiana Univ., Bloomington, IN, USA
Abstract :
Analyzing large graphs is crucial to a variety of application domains, like personalized recommendations in social networks, search engines, communication networks, computational biology, etc. In these domains, there is a need to process aggregation queries over large graphs. Existing approaches for aggregation are not suitable for large graphs, as they involve multi-way relational joins over gigantic tables or repeated multiplications of large matrices. In this paper, we consider top-K aggregation queries that involve identifying top-K nodes with highest aggregate values over their h-hop neighbors. We propose algorithms for processing such queries over large graphs in a shared nothing environment. Using the notion of graph partitioning, we propose an update-based algorithm that minimizes network overhead by propagating updates in the neighborhood information. The algorithm partitions a graph across a number of processing nodes, and uses an iterative join algorithm within each node. We present a hybrid scheme to further reduce the network overhead during a few initial iterations. We develop a baseline algorithm based on distributed joins. Our experimental results validate the effectiveness of the proposed algorithms in reducing the aggregation time and in scaling the aggregation computation over a number of distributed hosts.
Keywords :
graph theory; iterative methods; query processing; recommender systems; social networking (online); aggregation computation; baseline algorithm; communication networks; computational biology; graph partitioning; h-hop neighbors; iterative join algorithm; large graphs; multiway relation; network overhead; personalized recommendations; search engines; shared nothing environment; shared-nothing systems; social networks; top-K aggregation query; top-k aggregation; update-based algorithm;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691606