• DocumentCode
    2440950
  • Title

    Speculative execution on multi-GPU systems

  • Author

    Diamos, Gregory ; Yalamanchili, Sudhakar

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2010
  • fDate
    19-23 April 2010
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to accelerators in order to utilize them effectively. An accelerator is a processor with a different ISA and micro-architecture than the main CPU. These static partitioning schemes are effective when targeting a system with only a single accelerator. However, they are not robust to changes in the number of accelerators or the performance characteristics of future generations of accelerators. In previous work, we presented the Harmony execution model for computing on heterogeneous systems with several CPUs and accelerators. In this paper, we extend Harmony to target systems with multiple accelerators using control speculation to expose parallelism. We refer to this technique as Kernel Level Speculation (KLS). We argue that dynamic parallelization techniques such as KLS are sufficient to scale applications across several accelerators based on the intuition that there will be fewer distinct accelerators than cores within each accelerator. In this paper, we use a complete prototype of the Harmony runtime that we developed to explore the design decisions and trade-offs in the implementation of KLS. We show that KLS improves parallelism to a sufficient degree while retaining a sequential programming model. We accomplish this by demonstrating good scaling of KLS on a highly heterogeneous system with three distinct accelerator types and ten processors.
  • Keywords
    coprocessors; multiprocessing systems; parallel programming; CUDA; Harmony execution model; Harmony runtime; ISA; OpenCL; accelerator; application partitioning; components; computational capability; dynamic parallelization techniques; heterogeneous many-core processors; heterogeneous system; kernel level speculation; micro-architecture; multi-GPU systems; parallel programming languages; parallel programming models; sequential programming model; speculative execution; Character generation; Concurrent computing; Control systems; Instruction sets; Kernel; Parallel processing; Parallel programming; Prototypes; Robustness; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4244-6442-5
  • Type

    conf

  • DOI
    10.1109/IPDPS.2010.5470427
  • Filename
    5470427