• DocumentCode
    2397566
  • Title

    On improving the performance of data partitioning oriented parallel irregular reductions

  • Author

    Gutiérrez, E. ; Plata, O. ; Zapata, E.L.

  • Author_Institution
    Dept. of Comput. Archit., Malaga Univ., Spain
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    445
  • Lastpage
    452
  • Abstract
    Different parallelization techniques for reductions have been classified in this paper into two classes: LPO (loop partitioning-oriented techniques) and DPO (data partitioning-oriented techniques). We have analyzed both classes in terms of a set of performance properties: data locality, memory overhead, parallelism and workload balancing. We propose several techniques to increase the exploited parallelism and to introduce load balancing into a DPO method. Regarding parallelism, the solution is based on the partial expansion of the reduction array. For load balancing, the first technique is generic, as it can deal with any kind of load unbalance present in the problem domain. The second technique handles a special case of load unbalancing appearing when there are a large number of write operations on small regions of the reduction arrays. Efficient implementations of the proposed optimizing solutions for the DWA-LIP (data write affinity-loop index prefetching) DPO method are presented, experimentally tested on static and dynamic kernel codes, and compared with other parallel reduction methods
  • Keywords
    arrays; numerical analysis; parallel algorithms; resource allocation; shared memory systems; software performance evaluation; DWA-LIP method; data locality; data partitioning-oriented parallel irregular reductions; data-write affinity; dynamic kernel codes; load unbalancing; loop partitioning-oriented techniques; loop-index prefetching; memory overhead; optimizing solutions; parallelism; parallelization techniques; performance improvement; performance properties; reduction array partial expansion; static kernel codes; workload balancing; write operations; Computer architecture; Concurrent computing; Kernel; Load management; Optical computing; Optimization methods; Parallel processing; Privatization; Testing; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-based Processing, 2002. Proceedings. 10th Euromicro Workshop on
  • Conference_Location
    Canary Islands
  • Print_ISBN
    0-7695-1444-8
  • Type

    conf

  • DOI
    10.1109/EMPDP.2002.994330
  • Filename
    994330