• DocumentCode
    2967376
  • Title

    Portable software development for multi-core processors, many-core accelerators, and heterogeneous architectures

  • Author

    McCool, Michael

  • Author_Institution
    University of Waterloo, USA
  • fYear
    2008
  • fDate
    9-10 Aug. 2008
  • Abstract
    New processor architectures, including many-core accelerators like GPUs, multi-core CPUs, and heterogeneous architectures like the Cell BE, provide many opportunities for improved performance. However, programming these architectures productively in a performant and portable way is challenging. We have developed a software development platform that uses a common SPMD parallel programming model for all these processor architectures. The RapidMind platform allows developers to easily create single-source, conceptually single-threaded programs with an existing, standard C++ compiler that can target all the processing resources in such architectures. When compared to tuned baseline code using the best optimizing C++ compilers available, RapidMind-enabled code can demonstrate speedups of over an order of magnitude on x86 dual-processor quad-core systems (more than the number of cores, due to the enhanced data locality of the RapidMind programming model) and two orders of magnitude on accelerators. In this talk, I will discuss the performance strategy used by the RapidMind platform, which is based on the observation that only two things really matter for performance: parallelism and data locality. A developer should be provided with mechanisms for direct and convenient expressions of these crucial facets of an implementation. At the same time, to enhance portability and productivity, a programming system should avoid over-specification of details that can be optimized by the system itself (in a portable way), and to minimize debugging should emphasize correct-by-construction parallel programming patterns. Finally, resource limits and performance cliffs inhibit portability, but by allowing the specification of parameterized code and by using auto-tuning, these issues can be addressed.
  • Keywords
    Computer architecture; Function approximation; Graphics; Multicore processing; Optimizing compilers; Parallel processing; Parallel programming; Program processors; Rendering (computer graphics); Standards development;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Interactive Ray Tracing, 2008. RT 2008. IEEE Symposium on
  • Conference_Location
    Los Angeles, CA, USA
  • Print_ISBN
    978-1-4244-2741-3
  • Type

    conf

  • DOI
    10.1109/RT.2008.4634608
  • Filename
    4634608