• DocumentCode
    3738235
  • Title

    A hybrid design for high performance large-scale sorting on FPGA

  • Author

    Ajitesh Srivastava;Ren Chen;Viktor K. Prasanna;Charalampos Chelmis

  • Author_Institution
    Department of Computer Science, University of Southern California, Los Angeles, USA 90089
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Sorting is a key kernel in numerous big data application including database operations, graphs and text analytics. Due to low control overhead, parallel bitonic sorting networks are usually employed for hardware implementations to accelerate sorting. Although a typical implementation of merge sort network can lead to low latency and small memory usage, it suffers from low throughput due to the lack of parallelism in the final stage. We analyze a pipelined merge sort network, showing its theoretical limits in terms of latency, memory and, throughput. To increase the throughput, we propose a merge sort based hybrid design where the final few stages in the merge sort network are replaced with “folded” bitonic merge networks. In these “folded” networks, all the interconnection patterns are realized by streaming permutation networks (SPN). We present a theoretical analysis to quantify latency, memory and throughput of our proposed design. Performance evaluations are performed by experiments on Xilinx Virtex-7 FPGA with post place-androute results. We demonstrate that our implementation achieves a throughput close to 10 GBps, outperforming state-of-the-art implementation of sorting on the same hardware by 1.2x, while preserving lower latency and higher memory efficiency.
  • Keywords
    "Throughput","Sorting","Memory management","Parallel processing","Field programmable gate arrays","Clocks","Hardware"
  • Publisher
    ieee
  • Conference_Titel
    ReConFigurable Computing and FPGAs (ReConFig), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/ReConFig.2015.7393322
  • Filename
    7393322