Author_Institution :
Heat, Light & Sound Res. Inc., La Jolla, CA, USA
Abstract :
The quest for raw computing power has shifted from increasing processor clock speeds to increasing the number of processing cores. Currently, mainstream CPUs can be purchased in dual-slot quad-core and hex-core configurations. On the other hand, graphic cards provide hundreds of processing cores. Although there have been various implementations of scientific applications on graphics hardware, including underwater acoustic modeling, widespread use of this technology has been hampered by the often extraordinary effort needed to program this hardware, especially if the application architecture did not match the canonical graphics pipeline for gaming. In the last few years, the major graphics board manufacturers have stepped away from designing hardware specialized for particular new graphic special effects and made a concerted effort to provide general-purpose computing capabilities, of the sort that can be exploited for scientific computing. For example, Nvidia\´s CUDA environment currently provides many building blocks for scientific computing, such as (subsets of) BLAS, LAPACK, and FFTs. We will present our experiences implementing the split-step Fourier parabolic equation (PE) model in NVIDIA\´s "Compute Unified Device Architecture" or CUDA environment, showing how we have achieved a 10 times speedup relative to a multi-core CPU implementation, with a modest investment in programmingmulti-core CPU implementation, with a modest investment in programming effort. In the repertoire of wave propagation modeling approaches, a parabolic equation model is typically used for range-dependent problems in situations when a ray tracing approach would not provide enough fidelity (e.g. because a high frequency approximation was not warranted for the waveguide being modeled). PE models are narrowband models, so a broadband application would require running multiple frequencies to cover the band of interest, followed by a synthesis via inverse FFT to form the predicted time-domain- waveform, which has obvious opportunities for parallelization. This application was initially selected because its key software component, the FFT, was available in a mature GPU-based implementation. In addition, a multi-core CPU implementation of the FFT was also available, enabling a very direct comparison of CPU versus GPU effort. In the repertoire of wave propagation modeling approaches, a parabolic equation model is typically used for range-dependent problems in situations when a ray tracing approach would not provide enough fidelity (e.g. because a high frequency approximation was not warranted for the waveguide being modeled). PE models are narrowband models, so a broadband application would require running multiple frequencies to cover the band of interest, followed by a synthesis via inverse FFT to form the predicted time-domain waveform, which has obvious opportunities for parallelization. This application was initially selected because its key software component, the FFT, was available in a mature GPU-based implementation. In addition, a multi-core CPU implementation of the FFT was also available, enabling a very direct comparison of CPU versus GPU performance using nearly identical code bases. We will describe the key steps needed to adapt this model to the GPU architecture. For example, an important aspect of accelerating applications on GPU architectures is effectively taking advantage of the features of the different memory types that reside on GPUs. Since the bandwidth between cores within the GPU is 5-10 times greater than the bandwidth from the CPU to the GPU, it is important to minimize the amount of data transferred in and out of the GPU. Fortunately, GPUs also have a type of memory called texture memory, which conveniently provides hardware accelerated interpolation thus, a sparse representation of the range-dependent waveguide parameters (sound speed profile, bathymetry, geo-acoustic parameters of the seabed) can be loaded into texture memory,
Keywords :
fast Fourier transforms; graphics processing units; multiprocessing systems; parabolic equations; parallel architectures; underwater acoustic propagation; BLAS; FFT; GPU architectures; LAPACK; bathymetry; broadband application; compute unified device architecture; general purpose graphic processing units; general-purpose computing capabilities; graphics hardware; multicore CPU implementation; narrowband models; range-dependent problems; scientific computing; seabed geo-acoustic parameters; sound speed profile; split-step Fourier parabolic equation model; texture memory; time-domain waveform; underwater acoustic propagation modeling; wave propagation modeling approach; Computational modeling; Graphics; Graphics processing unit; Hardware; Instruction sets; Kernel; Mathematical model; CUDA; Split-step Fourier parabolic equation; general purpose graphic processing unit; high-performance computing;