Title :
Portable software development for multi-core processors, many-core accelerators, and heterogeneous architectures
Abstract :
New processor architectures, including many-core accelerators like GPUs, multi-core CPUs, and heterogeneous architectures like the Cell BE, provide many opportunities for improved performance. However, programming these architectures productively in a performant and portable way is challenging. We have developed a software development platform that uses a common SPMD parallel programming model for all these processor architectures. The RapidMind platform allows developers to easily create single-source, conceptually single-threaded programs with an existing, standard C++ compiler that can target all the processing resources in such architectures. When compared to tuned baseline code using the best optimizing C++ compilers available, RapidMind-enabled code can demonstrate speedups of over an order of magnitude on x86 dual-processor quad-core systems (more than the number of cores, due to the enhanced data locality of the RapidMind programming model) and two orders of magnitude on accelerators. In this talk, I will discuss the performance strategy used by the RapidMind platform, which is based on the observation that only two things really matter for performance: parallelism and data locality. A developer should be provided with mechanisms for direct and convenient expressions of these crucial facets of an implementation. At the same time, to enhance portability and productivity, a programming system should avoid over-specification of details that can be optimized by the system itself (in a portable way), and to minimize debugging should emphasize correct-by-construction parallel programming patterns. Finally, resource limits and performance cliffs inhibit portability, but by allowing the specification of parameterized code and by using auto-tuning, these issues can be addressed.