• DocumentCode
    2981478
  • Title

    Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems

  • Author

    Pilla, Laercio L. ; Navaux, Philippe Olivier Alexandre ; Ribeiro, C.P. ; Coucheney, Pierre ; Broquedis, Francois ; Gaujal, Bruno ; Mehaut, J.

  • Author_Institution
    Inst. of Inf., Fed. Univ. of Rio Grande do Sul, Porto Alegre, Brazil
  • fYear
    2012
  • fDate
    17-19 Dec. 2012
  • Firstpage
    236
  • Lastpage
    243
  • Abstract
    Current multi-core machines feature a complex and hierarchical core topology, multiple levels of cache and memory subsystem with NUMA design. Although this design provides high processing power to parallel machines, it comes with the cost of asymmetric memory access latencies. Depending on the parallel application communication patterns, this asymmetry may reduce the overall performance of the system. Therefore, to achieve scalable performance in this environment, it becomes crucial to exploit the machine architecture while taking into account the application communication patterns. In this paper, we introduce a topology-aware load balancing algorithm named HWTOPOLB. It combines the machine topology characteristics with the communication patterns of the application to equalize the application load on the available cores while reducing latencies. We also present the proof that the algorithm is asymptotically optimal (Theorem 1). We have implemented our load balancing algorithm using the CHARM++ Parallel System and analyzed its performance using three different benchmarks. Our experimental results show that the HWTOPOLB can achieve average performance improvements of 24% when compared to existing load balancing strategies on three different multi-core machines.
  • Keywords
    multiprocessing systems; parallel machines; resource allocation; topology; CHARM++ Parallel System; NUMA design; asymptotically optimal load balancing; cache subsystem; hierarchical core topology; hierarchical multicore systems; high processing power; machine architecture; machine topology characteristics; memory subsystem; multicore machines; parallel application communication patterns; parallel machines; topology aware load balancing algorithm; Algorithm design and analysis; Benchmark testing; Libraries; Load management; Multicore processing; Runtime; Topology; algorithm; hierarchical architecture; load balancing; multi-core; performance evaluation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on
  • Conference_Location
    Singapore
  • ISSN
    1521-9097
  • Print_ISBN
    978-1-4673-4565-1
  • Electronic_ISBN
    1521-9097
  • Type

    conf

  • DOI
    10.1109/ICPADS.2012.41
  • Filename
    6413691