• DocumentCode
    1858050
  • Title

    Performance of CUDA Virtualized Remote GPUs in High Performance Clusters

  • Author

    Duato, Josè ; Pena, A.J. ; Silla, Federico ; Mayo, Rafael ; Quintana-Ortí, Enrique S.

  • Author_Institution
    Univ. Politec. de Valencia (UPV), Valencia, Spain
  • fYear
    2011
  • fDate
    13-16 Sept. 2011
  • Firstpage
    365
  • Lastpage
    374
  • Abstract
    In a previous work we presented the architecture of rCUDA, a middleware that enables CUDA remoting over a commodity network. That is, the middleware allows an application to use a CUDA-compatible Graphics Processor (GPU) installed in a remote computer as if it were installed in the computer where the application is being executed. This approach is based on the observation that GPUs in a cluster are not usually fully utilized, and it is intended to reduce the number of GPUs in the cluster, thus lowering the costs related with acquisition and maintenance while keeping performance close to that of the fully-equipped configuration. In this paper we model rCUDA over a series of high throughput networks in order to assess the influence of the performance of the underlying network on the performance of our virtualization technique. For this purpose, we analyze the traces of two different case studies over two different networks. Using this data, we calculate the expected performance for these same case studies over a series of high throughput networks, in order to characterize the expected behavior of our solution in high performance clusters. The estimations are validated using real 1 Gbps Ethernet and 40 Gbps InfiniBand networks, showing an error rate in the order of 1% for executions involving data transfers above 40 MB. In summary, although our virtualization technique noticeably increases execution time when using a 1 Gbps Ethernet network, it performs almost as efficiently as a local GPU when higher performance interconnects are used. Therefore, the small overhead incurred by our proposal because of the remote use of GPUs is worth the savings that a cluster configuration with less GPUs than nodes reports.
  • Keywords
    computer graphic equipment; virtual machines; workstation clusters; CUDA virtualized remote GPU; Ethernet; InfiniBand networks; bit rate 1 Gbit/s; bit rate 40 Gbit/s; graphics processor; high performance clusters; high throughput networks; rCUDA; Acceleration; Computer architecture; Graphics processing unit; Kernel; Payloads; Proposals; Servers; CUDA; Clusters; Graphics processors (GPUs); high performance computing; virtualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2011 International Conference on
  • Conference_Location
    Taipei City
  • ISSN
    0190-3918
  • Print_ISBN
    978-1-4577-1336-1
  • Electronic_ISBN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2011.58
  • Filename
    6047204