• DocumentCode
    3091684
  • Title

    Augmentation of Programs with CUDA Streams

  • Author

    Sharmistha ; Amilkanthwar, Madhur ; Balachandran, Shankar

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India
  • fYear
    2012
  • fDate
    10-13 July 2012
  • Firstpage
    855
  • Lastpage
    856
  • Abstract
    A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computation which typically requires a significant effort from the programmer. We propose an approach of transforming C programs to programs that can make use of CUDA streams. We identify the regions where data transfer and computation can be overlapped by using a polyhedral framework called PLUTO[2]. We use the PLUTO framework to do automatic tiling of source code and use the streaming capabilities to overlap data transfer with computation. Our results show an average speedup of 1.5X over CUDA programs without streaming optimizations.
  • Keywords
    C language; graphics processing units; parallel architectures; parallel programming; C programs; CUDA 2.0 architecture; CUDA streams; GPGPU; PLUTO framework; automatic tiling; data transfer; general purpose graphics processing unit; parallel computing; polyhedral framework; source code; Graphics processing unit; Kernel; Optimization; Parallel processing; Pluto; Tiles; CUDA;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on
  • Conference_Location
    Leganes
  • Print_ISBN
    978-1-4673-1631-6
  • Type

    conf

  • DOI
    10.1109/ISPA.2012.132
  • Filename
    6280393