مرکز منطقه ای اطلاع رساني علوم و فناوري - Affine Parallelization Using Dependence and Cache Analysis in a Binary Rewriter

DocumentCode :

739758

Title :

Affine Parallelization Using Dependence and Cache Analysis in a Binary Rewriter

Author :

Kotha, Aparna ; Anand, Kapil ; Creech, Timothy ; ElWazeer, Khaled ; Smithson, Matthew ; Yellareddy, Greeshma ; Barua, Rajeev

Author_Institution :

Dept. of ECE, Univ. of Maryland, College Park, MD, USA

Volume :

Issue :

fYear :

2015

Firstpage :

2154

Lastpage :

2163

Abstract :

Today, nearly all general-purpose computers are parallel, but nearly all software running on them is serial. Bridging this disconnect by manually rewriting source code in parallel is prohibitively expensive. Hence, automatic parallelization is an attractive alternative. We present a method to perform automatic parallelization in a binary rewriter. The input to the binary rewriter is the serial binary executable program and the output is a parallel binary executable. The advantages of parallelization in a binary rewriter versus a compiler include: (i) applicability to legacy binaries whose source is not available; (ii) applicability to library code that is often distributed only as binary; (iii) usability by end-user of the software; and (iv) applicability to assembly-language programs. Adapting existing parallelizing compiler methods that work on source code to binaries is a significant challenge. This is primarily because symbolic and array index information used by existing parallelizing compilers to take parallelizing decisions is not available from a binary. We show how to adapt existing parallelization methods to binaries without using symbolic and array index information to achieve equivalent source parallelization from binaries. Results using our x86 binary rewriter called SecondWrite are presented in detail for the dense-matrix and regular Polybench benchmark suite. For these, the average speedup from our method for binaries is 27.95X versus 28.22X from source on 24 threads, compared to the input serial binary. Super-linear speedups are possible due to cache optimizations. In addition our results show that our method is robust and has correctly parallelized much larger binaries from the SPEC 2006 and OMP 2001 benchmark suites, totaling over 2 million source lines of code (SLOC). Good speedups result on the subset of those benchmarks that have affine parallelism in them; this subset exceeds 100,000 SLOC. This robustness is unusual even for the latest leading- source parallelizers, many of which are famously fragile.

Keywords :

affine transforms; general purpose computers; rewriting systems; source code (software); affine parallelization; array index information; assembly-language programs; automatic parallelization; binary rewriter; cache analysis; dense-matrix; dependence analysis; general-purpose computers; library code; parallel binary executable; regular polybench benchmark suite; serial binary executable program; source code rewriting; source parallelization; super-linear speedups; symbolic information; Arrays; Binary codes; Educational institutions; Indexes; Optimization; Parallel processing; Vectors; Parallel programming; automatic programming; multithreading;

fLanguage :

English

Journal_Title :

Parallel and Distributed Systems, IEEE Transactions on

Publisher :

ieee

ISSN :

1045-9219

Type :

jour

DOI :

10.1109/TPDS.2014.2349501

Filename :

6880338

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=739758