• DocumentCode
    3000443
  • Title

    Deriving a Methodology for Code Deployment on Multi-Core Platforms via Iterative Manual Optimizations

  • Author

    McCool, Stuart ; Milligan, Peter ; Sage, Paul

  • Author_Institution
    Sch. of Electron., Electr. Eng. & Comput. Sci., Queen´´s Univ. of Belfast, Belfast, UK
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    1406
  • Lastpage
    1415
  • Abstract
    In recent years, there has been what can only be described as an explosion in the types of processing devices one can expect to find within a given computer system. These include the multi-core CPU, the General Purpose Graphics Processing Unit (GPGPU) and the Accelerated Processing Unit (APU), to name but a few. The widespread uptake of these systems presents would-be users with at least two problems. Firstly, each device exposes a complex underlying architecture which must be appreciated in order to attain optimal performance. This is coupled with the fact that a single system can support an arbitrary number of such devices. Consequently, fully leveraging the performance capabilities of such a system must come at a cost -- increasingly prolonged development times. Adhering to a methodology will have the significant industrial impact of reducing these development times. This paper describes the continued formulation of such a novel methodology. Two real world scientific programs are optimized for execution on the CUDA platform. Double precision accuracy and optimized speedups (which include PCI-E transfer times) of 15x and 17x are achieved.
  • Keywords
    multiprocessing systems; parallel architectures; APU; CUDA platform; GPGPU; PCI-E transfer; accelerated processing unit; code deployment; general purpose graphics processing unit; iterative manual optimizations; multicore CPU; multicore platforms; Acceleration; Graphics processing unit; Guidelines; Kernel; Performance evaluation; Registers; GPGPU; Racah; Slater; artificial intelligence; characterization; heterogeneous computing; methodology; toolset;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-0974-5
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2012.178
  • Filename
    6270808