• DocumentCode
    167451
  • Title

    Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs

  • Author

    Donfack, Simplice ; Tomov, Stanimire ; Dongarra, Jack

  • Author_Institution
    Innovative Comput. Lab., Univ. of Tennessee, Knoxville, TN, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    958
  • Lastpage
    965
  • Abstract
    Graphics processing units (GPUs) brought huge performance improvements in the scientific and numerical fields. We present an efficient hybrid CPU/GPU approach that is portable, dynamically and efficiently balances the workload between the CPUs and the GPUs, and avoidsdata transfer bottlenecks that are frequently present in numerical algorithms. Our approach determines the amount of initial work to assign to the CPUs before the execution, and then dynamically balances workloads during the execution. Then, we present a theoretical model to guide the choice of the initial amount of work for the CPUs. The validation of our model allows our approach to self-adapt on any architecture using the manufacturer´s characteristics of the underlying machine. We illustrate our method for the LU factorization. For this case, we show that the use of our approach combined with a communication avoiding LU algorithm is efficient. For example, our experiments on a 24 cores AMD opteron 6172 show that by adding one GPU (Tesla S2050) we accelerate LU up to 2.4× compared to the corresponding routine in MKL using 24 cores. The comparisons with MAGMA also show significant improvements.
  • Keywords
    graphics processing units; mathematics computing; matrix decomposition; multiprocessing systems; resource allocation; synchronisation; AMD opteron 6172; LU factorization; MAGMA; MKL; data transfer; dynamic workload balancing; dynamically balanced synchronization; graphics processing units; hybrid CPU-GPU approach; manufacturer characteristics; multicore system; performance improvements; workload balances; Computer architecture; Graphics processing units; Heuristic algorithms; Load management; Matrix decomposition; Partitioning algorithms; LU factorization; hybrid CPU/GPU programming; synchronization avoiding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.109
  • Filename
    6969485