Title : 
Accelerating range-based loops on heterogeneous systems
         
        
            Author : 
Suwancharoen, Chaturapat ; Marurngsith, Worawan
         
        
            Author_Institution : 
Dept. of Comput. Sci., Thammasat Univ., Pathum Thani, Thailand
         
        
        
        
        
        
            Abstract : 
Range-based loop is a powerful construct due to its clear and concise syntax. The abstraction of loop index in a range-based loop implies loop-level parallelism ready to be exploited. Despite its advantage on hidden parallelism and programmability, the magnitude of performance gain by accelerating range-based loop on heterogeneous systems is still not well studied. This paper addresses this issue and make three contributions. First, the review showing the magnitude of performance gain from CUDA/OpenCL code, generated by ten exisiting auto-parallelizing compilers is presented. Second, the performance comparison between range-based and traditional loops acceleration on four workloads from the SHOC benchmark is reported. Third, the performance limitation on using directive-based compiler to accelerate range-based loop is discussed. The results show that transforming scientific workloads to exploit range-based loops is a challenge. The review results show that code generated by auto-parallelizing achieved an average of 37±23 folds speedup relative to sequential CPU, while the proposed range-based compiler achieved higher speedup than the average (44.8±22x). The evaluation against four workloads from highly-tuned benchmark shows that range-based loop acceleration achieved in average 72% of the benchmark´s performance. This highlights range-based loops as a promising target for auto parallelizing compiling code on heterogeneous systems.
         
        
            Keywords : 
application program interfaces; parallel architectures; parallel programming; parallelising compilers; program control structures; software performance evaluation; CUDA/OpenCL code; SHOC benchmark; auto-parallelizing compilers; directive-based compiler; heterogeneous systems; range-based loop acceleration; Acceleration; Benchmark testing; Containers; Graphics processing units; Parallel processing; Performance evaluation; Performance gain; GPU; OpenCL; directive-based compiler; heterogeneous systems; loop parallelization;
         
        
        
        
            Conference_Titel : 
Knowledge and Smart Technology (KST), 2015 7th International Conference on
         
        
            Conference_Location : 
Chonburi
         
        
            Print_ISBN : 
978-1-4799-6048-4
         
        
        
            DOI : 
10.1109/KST.2015.7051466