• DocumentCode
    2845552
  • Title

    Scalable Dynamic Load Balancing Using UPC

  • Author

    Olivier, Stephen ; Prins, Jan

  • Author_Institution
    Dept. of Comput. Sci., North Carolina Univ. - Chapel Hill, Chapel Hill, NC
  • fYear
    2008
  • fDate
    9-12 Sept. 2008
  • Firstpage
    123
  • Lastpage
    131
  • Abstract
    An asynchronous work-stealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark [Olivier, S., et al., 2007]. The UTS benchmark presents a synthetic tree-structured search space that is highly imbalanced. Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [Olivier, S., et al., 2007] and MPI [Dinan, J., et al., 2007]. We observe parallel efficiency of 80% using 1024 processors performing over 85,000 total load balancing operations per second continuously. The UPC programming model provides substantial simplifications in the expression of the asynchronous work stealing protocol compared with MPI. However, to obtain performance portability with UPC in both shared memory and distributed memory settings requires the careful use of one sided reads and writes to minimize the impact of high latency communication. Additional protocol improvements are made to improve dissemination of available work and to decrease the cost of termination detection.
  • Keywords
    parallel programming; resource allocation; tree data structures; tree searching; UPC; Unified Parallel C; asynchronous work-stealing implementation; available work dissemination; distributed memory; scalable dynamic load balancing; shared memory; synthetic tree-structured search space; termination detection cost; unbalanced tree search; Aerodynamics; Computer science; Cost function; Delay; Hardware; Load management; Parallel processing; Protocols; Read-write memory; State-space methods; Unbalanced Tree Search benchmark; Unified Parallel C; distributed memory systems; load balancing; performance analysis; unbalanced parallel computations; work stealing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing, 2008. ICPP '08. 37th International Conference on
  • Conference_Location
    Portland, OR
  • ISSN
    0190-3918
  • Print_ISBN
    978-0-7695-3374-2
  • Electronic_ISBN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2008.19
  • Filename
    4625841