• DocumentCode
    1925260
  • Title

    Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework

  • Author

    Rajachandrasekar, Raghunath ; Jaswani, Jai ; Subramoni, Hari ; Panda, Dhabaleswar K DK

  • Author_Institution
    Network-Based Comput. Lab., Ohio State Univ., Columbus, OH, USA
  • fYear
    2012
  • fDate
    24-28 Sept. 2012
  • Firstpage
    329
  • Lastpage
    336
  • Abstract
    The rapid growth of supercomputing systems, both in scale and complexity, has been accompanied by degradation in system efficiencies. The sheer abundance of resources including millions of cores, vast amounts of physical memory and high-bandwidth networks are heavily under-utilized. This happens when the resources are time-shared amongst parallel applications that are scheduled to run on a subset of compute nodes in an exclusive manner. Several space-sharing techniques that have been proposed in the literature allow parallel applications to be co-located on compute nodes and share resources with each other. Although this leads to better system efficiencies, it also causes contention for system resources. In this work, we specifically address the problem of network contention, caused due to the sharing of network resources by parallel applications and file systems simultaneously. We leverage the Quality-of-Service (QoS) capabilities of the widely used Infini Band interconnect to enhance our data-staging file system, making it QoS-aware. This is a user-level framework that is agnostic of the file system and MPI implementation. Using this file system, we demonstrate the isolation of file system traffic from MPI communication traffic, thereby reducing the network contention. Experimental results show that MPI point-to-point latency can be reduced by up to 320 microseconds, and the bandwidth improved by up to 674MB/s in the presence of contention with I/O traffic. Furthermore, we were able to reduce the runtime of the AWP-ODC MPI application by about 9.89% in the presence of network contention, and also reduce the time spent in communication by the NAS CG kernel by 23.46%.
  • Keywords
    application program interfaces; file organisation; input-output programs; parallel machines; pattern clustering; processor scheduling; quality of service; AWP-ODC MPI; I/O traffic; InfiniBand cluster; MPI communication traffic; NAS CG kernel; QoS; bit rate 674 Mbit/s; data staging filesystem; data staging framework; network contention minimization; network resource sharing; parallel application; point-to-point latency; quality of service; scheduling; sheer abundance; space sharing technique; supercomputing system; Bandwidth; Fabrics; Kernel; Libraries; Noise; Quality of service; Servers; Data-Staging; InfiniBand; Network Contention and Filesystems; Quality-of-Service; Space-Sharing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2012 IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4673-2422-9
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2012.90
  • Filename
    6337795