• DocumentCode
    2534926
  • Title

    Automatic thread extraction with decoupled software pipelining

  • Author

    Ottoni, Guilherme ; Rangan, Ram ; Stoler, Adam ; August, David I.

  • Author_Institution
    Dept. of Comput. Sci. & Electr. Eng., Princeton Univ., NJ, USA
  • fYear
    2005
  • fDate
    12-16 Nov. 2005
  • Abstract
    Until recently, a steadily rising clock rate and other uniprocessor micro architectural improvements could be relied upon to consistently deliver increasing performance for a wide range of applications. Current difficulties in maintaining this trend have lead microprocessor manufacturers to add value by incorporating multiple processors on a chip. Unfortunately, since decades of compiler research have not succeeded in delivering automatic threading for prevalent code properties, this approach demonstrates no improvement for a large class of existing codes. To find useful work for chip multiprocessors, we propose an automatic approach to thread extraction, called decoupled software pipelining (DSWP). DSWP exploits the finegrained pipeline parallelism lurking in most applications to extract long-running, concurrently executing threads. Use of the nonspeculative and truly decoupled threads produced by DSWP can increase execution efficiency and provide significant latency tolerance, mitigating design complexity by reducing intercore communication and per-core resource requirements. Using our initial fully automatic compiler implementation and a validated processor model, we prove the concept by demonstrating significant gains for dual-core chip multiprocessor models running a variety of codes. We then explore simple opportunities missed by our initial compiler implementation which suggest a promising future for this approach.
  • Keywords
    microprocessor chips; multi-threading; multiprocessing systems; pipeline processing; program compilers; automatic compiler implementation; automatic thread extraction; chip multiprocessors; decoupled software pipelining; dual-core chip multiprocessor models; extraction threading; intercore communication; per-core resource requirements; processor model; Application software; Clocks; Delay; Hardware; Manufacturing processes; Microarchitecture; Microprocessors; Parallel processing; Pipeline processing; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Microarchitecture, 2005. MICRO-38. Proceedings. 38th Annual IEEE/ACM International Symposium on
  • Print_ISBN
    0-7695-2440-0
  • Type

    conf

  • DOI
    10.1109/MICRO.2005.13
  • Filename
    1540952