Title :
Cholesky Factorization on Heterogeneous CPU and GPU Systems
Author :
Jieyang Chen;Zizhong Chen
Author_Institution :
Dept. of Comput. Sci., Univ. of California, Riverside, Riverside, CA, USA
Abstract :
General-purpose graphics processing units (GPGPUs) could bring huge performance improvements in scientific and numerical fields. We presented two approaches utilizing hybrid CPU/GPU system in Cholesky factorization. First, we analyzed the implementation of Cholesky factorization in MAGMA and identified the bottleneck of the current implementation, which is the use of fixed block size without considering any factors in the computing environment. So, we designed an algorithm, which could determine the optimal block size of Cholesky factorization based on multiple factors (input matrix size, CPU/GPU performance, and CPU/GPU bandwidth, etc.). Then, we presented a new improvement on MAGMA´s implementation utilize the algorithm. Test results showed that our approach is more efficient than MAGMA´s fixed block size implementation under some circumstance. After combining our implementation with MAGMA´s implementation, the new hybrid implementation could outperform the current MAGMA implementation. Second, we identified that all the implementations of Cholesky factorization, to our best knowledge, that utilized the GPU do not fully utilized the multicore CPU. So, after studied other researchers approaches, we designed a new algorithm that could utilize multicore CPU and GPU simultaneously in Cholesky factorization. Our approach could keep the block size and workload distribution between CPU and GPU dynamically. Testing results showed the optimal data distribution ratio for our current implementation.
Keywords :
"Graphics processing units","Multicore processing","Data transfer","Central Processing Unit","Algorithm design and analysis","Symmetric matrices","Libraries"
Conference_Titel :
Frontier of Computer Science and Technology (FCST), 2015 Ninth International Conference on
DOI :
10.1109/FCST.2015.58