DocumentCode :
3738235
Title :
A hybrid design for high performance large-scale sorting on FPGA
Author :
Ajitesh Srivastava;Ren Chen;Viktor K. Prasanna;Charalampos Chelmis
Author_Institution :
Department of Computer Science, University of Southern California, Los Angeles, USA 90089
fYear :
2015
Firstpage :
1
Lastpage :
6
Abstract :
Sorting is a key kernel in numerous big data application including database operations, graphs and text analytics. Due to low control overhead, parallel bitonic sorting networks are usually employed for hardware implementations to accelerate sorting. Although a typical implementation of merge sort network can lead to low latency and small memory usage, it suffers from low throughput due to the lack of parallelism in the final stage. We analyze a pipelined merge sort network, showing its theoretical limits in terms of latency, memory and, throughput. To increase the throughput, we propose a merge sort based hybrid design where the final few stages in the merge sort network are replaced with “folded” bitonic merge networks. In these “folded” networks, all the interconnection patterns are realized by streaming permutation networks (SPN). We present a theoretical analysis to quantify latency, memory and throughput of our proposed design. Performance evaluations are performed by experiments on Xilinx Virtex-7 FPGA with post place-androute results. We demonstrate that our implementation achieves a throughput close to 10 GBps, outperforming state-of-the-art implementation of sorting on the same hardware by 1.2x, while preserving lower latency and higher memory efficiency.
Keywords :
"Throughput","Sorting","Memory management","Parallel processing","Field programmable gate arrays","Clocks","Hardware"
Publisher :
ieee
Conference_Titel :
ReConFigurable Computing and FPGAs (ReConFig), 2015 International Conference on
Type :
conf
DOI :
10.1109/ReConFig.2015.7393322
Filename :
7393322
Link To Document :
بازگشت