• DocumentCode
    3210146
  • Title

    A Systemic Strategy for Tuning Intra-node Collective Communication on Multicore Systems

  • Author

    Liu, Zhiqiang ; Song, Junqiang ; Ren, Kaijun ; Xu, Fen ; Qu, Xiaoling

  • Author_Institution
    Coll. of Comput., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2009
  • fDate
    17-19 Dec. 2009
  • Firstpage
    14
  • Lastpage
    21
  • Abstract
    In HPC domain, a majority of applications build on MPI and employ collective operations in their communication kernels. Improving the performance of collectives has been long term focused by a lot of work. Recently, in the optimization work of collectives on multi-core clusters, hierarchical algorithm designs are remark-able. This kind of algorithms can greatly reduce the inter-node traffic but increase the intra-node traffic load at the same time. Meanwhile, in hierarchical collectives, the part of intra-node collectives take more and more time while the number of cores in each node keeps growing. Improving the performance of intra-node collectives is critical to the holistic performance. However, on multi-cores, the factor of process affinity greatly impacts the performance of an intra-node collective. This peculiarity challenges us how to improve the overall performance of intra-node collectives. Towards this problem, in this paper, we propose a novel and systemic strategy for tuning the performance of intra-node collectives. As illustrative examples, we have implemented our strategy on a dual-socket Intel Clovertown platform and successfully tuned the performance of Broadcast and Allgather up to 14% and 52% improvement together.
  • Keywords
    message passing; microprocessor chips; MPI; dual-socket Intel Clovertown platform; hierarchical algorithm designs; high performance computing; intranode collective communication tuning; intranode traffic load; multicore clusters; multicore systems; systemic strategy; Algorithm design and analysis; Application software; Broadcasting; Clustering algorithms; Computer science; Design optimization; Educational institutions; Kernel; Multicore processing; Telecommunication traffic; Allgather; Broadcast; Collective communication; MPI; Multicore;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontier of Computer Science and Technology, 2009. FCST '09. Fourth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3932-4
  • Electronic_ISBN
    978-1-4244-5467-9
  • Type

    conf

  • DOI
    10.1109/FCST.2009.101
  • Filename
    5392942