DocumentCode :
140840
Title :
Locality-sensitive operators for parallel main-memory database clusters
Author :
Rodiger, Wolf ; Muhlbauer, Tobias ; Unterbrunner, Philipp ; Reiser, Angelika ; Kemper, Alfons ; Neumann, Tobias
Author_Institution :
Tech. Univ. Munchen, Munich, Germany
fYear :
2014
fDate :
March 31 2014-April 4 2014
Firstpage :
592
Lastpage :
603
Abstract :
The growth in compute speed has outpaced the growth in network bandwidth over the last decades. This has led to an increasing performance gap between local and distributed processing. A parallel database cluster thus has to maximize the locality of query processing. A common technique to this end is to co-partition relations to avoid expensive data shuffling across the network. However, this is limited to one attribute per relation and is expensive to maintain in the face of updates. Other attributes often exhibit a fuzzy co-location due to correlations with the distribution key but current approaches do not leverage this. In this paper, we introduce locality-sensitive data shuffling, which can dramatically reduce the amount of network communication for distributed operators such as join and aggregation. We present four novel techniques: (i) optimal partition assignment exploits locality to reduce the network phase duration; (ii) communication scheduling avoids bandwidth underutilization due to cross traffic; (iii) adaptive radix partitioning retains locality during data repartitioning and handles value skew gracefully; and (iv) selective broadcast reduces network communication in the presence of extreme value skew or large numbers of duplicates. We present comprehensive experimental results, which show that our techniques can improve performance by up to factor of 5 for fuzzy co-location and a factor of 3 for inputs with value skew.
Keywords :
data handling; parallel databases; adaptive radix partitioning technique; aggregation operator; communication scheduling technique; distributed processing; fuzzy colocation; join operator; locality-sensitive data shuffling; locality-sensitive operators; network bandwidth; network communication; optimal partition assignment technique; parallel main-memory database clusters; performance gap; query processing; relation copartitioning; selective broadcast technique; value skew; Algorithm design and analysis; Bandwidth; Correlation; Database systems; Distributed databases; Partitioning algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
Type :
conf
DOI :
10.1109/ICDE.2014.6816684
Filename :
6816684
Link To Document :
بازگشت