• DocumentCode
    3222008
  • Title

    Compiler algorithms for optimizing locality and parallelism on shared and distributed memory machines

  • Author

    Kandemir, M. ; Ramanujam, J. ; Choudhary, A.

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Syracuse Univ., NY, USA
  • fYear
    1997
  • fDate
    10-14 Nov 1997
  • Firstpage
    236
  • Lastpage
    247
  • Abstract
    Distributed memory message passing machines can deliver scalable performance but are difficult to program. Shared memory machines, on the other hand, are easier to program but obtaining scalable performance with a large number of processors is difficult. Previously, some scalable architectures based on logically-shared physically-distributed memory have been designed and implemented. While some of the performance issues like parallelism and locality are common to the different parallel architectures, issues such as data decomposition are unique to specific types of architectures. One of the most important challenges compiler writers face is to design compilation techniques that can work on a variety of architectures. In this paper, we propose an algorithm that can be employed by optimizing compilers for different types of parallel architectures. Our optimization algorithm does the following: (1) transforms loop nests such that, where possible, the outermost loops can be run in parallel across processors; (2) decomposes each array across processors; (3) optimizes interprocessor communication by vectorizing it whenever possible; and (it) optimizes locality (cache performance) by assigning appropriate storage layout for each array. Depending on the underlying hardware system, some or all of these steps can be applied in a unified framework. We present simulation results for cache miss rates, and empirical results on SUN SPARCstation 5, IBM SP-2, SGI Challenge and Convex Exemplar to validate the effectiveness of our approach on different architectures
  • Keywords
    cache storage; distributed memory systems; matrix multiplication; optimising compilers; parallel architectures; parallelising compilers; shared memory systems; Convex Exemplar; IBM SP-2; SGI Challenge; SUN SPARCstation 5; cache performance; compiler algorithms; data decomposition; distributed memory machines; interprocessor communication; loop nests; optimizing compilers; parallel architectures; shared memory machines; storage layout; Cache storage; Hardware; Memory architecture; Message passing; Optimizing compilers; Parallel architectures; Parallel machines; Parallel processing; Random access memory; Sun;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures and Compilation Techniques., 1997. Proceedings., 1997 International Conference on
  • Conference_Location
    San Francisco, CA
  • Print_ISBN
    0-8186-8090-3
  • Type

    conf

  • DOI
    10.1109/PACT.1997.644019
  • Filename
    644019