• DocumentCode
    3322193
  • Title

    A Parallel Implementation of the 2D Wavelet Transform Using CUDA

  • Author

    Franco, Joaquín ; Bernabe, Gregorio ; Fernandez, J. ; Acacio, Manuel E.

  • Author_Institution
    Dipt. de Ing. y Tecnol. de Comput., Univ. de Murcia, Murcia
  • fYear
    2009
  • fDate
    18-20 Feb. 2009
  • Firstpage
    111
  • Lastpage
    118
  • Abstract
    There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.
  • Keywords
    computer graphics; microprocessor chips; parallel programming; wavelet transforms; 2D wavelet transform; CUDA; NVIDIA Tesla boards; OpenMP; computed unified device architecture; general-purpose computing on graphic processing units; parallel computing devices; Central Processing Unit; Computer architecture; Discrete cosine transforms; Graphics; Image coding; Multicore processing; Scientific computing; Video compression; Wavelet transforms; Yarn; 2D fast wavelet transform; CUDA; NVIDIA Tesla; multicore processor; parallel programming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on
  • Conference_Location
    Weimar
  • ISSN
    1066-6192
  • Print_ISBN
    978-0-7695-3544-9
  • Type

    conf

  • DOI
    10.1109/PDP.2009.40
  • Filename
    4912922