• DocumentCode
    688323
  • Title

    QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory

  • Author

    Long Cheng ; Kotoulas, Spyros ; Ward, Tomas E. ; Theodoropoulos, Georgios

  • Author_Institution
    Nat. Univ. of Ireland, Maynooth, Ireland
  • fYear
    2013
  • fDate
    13-15 Nov. 2013
  • Firstpage
    1519
  • Lastpage
    1527
  • Abstract
    The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless, most of the algorithms so far developed do not consider the data skew, which naturally exists in various applications. State of the art methods designed to handle this problem are based on extensions to either of the two prevalent conventional approaches to parallel joins - the hash-based and duplication-based frameworks. In this paper, we introduce a novel parallel join framework, query-based distributed join (QbDJ), for handling data skew on distributed architectures. Further, we present an efficient implementation of the method based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skews. The results show that the method is scalable, and also runs faster with less network communication compared to state-of-art PRPD approach in [1] under high data skew.
  • Keywords
    data handling; distributed memory systems; parallel programming; query processing; APGAS parallel programming model; Big Data; QbDJ; asynchronous partitioned global address space parallel programming model; data skew handling; duplication-based framework; hash-based framework; parallel distributed data management systems; parallel join processing; query-based distributed join; Arrays; Distributed databases; Histograms; Instruction sets; Parallel processing; Probes; Silicon; Distributed join; X10; data skew; high performance; parallel join;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on
  • Conference_Location
    Zhangjiajie
  • Type

    conf

  • DOI
    10.1109/HPCC.and.EUC.2013.214
  • Filename
    6832096