DocumentCode :
560201
Title :
Efficient data race detection for distributed memory parallel programs
Author :
Park, Chang-Seo ; Sen, Koushik ; Hargrove, Paul ; Iancu, Costin
Author_Institution :
Univ. of California, Berkeley, CA, USA
fYear :
2011
fDate :
12-18 Nov. 2011
Firstpage :
1
Lastpage :
12
Abstract :
In this paper we present a precise data race detection technique for distributed memory parallel programs. Our technique, which we call Active Testing, builds on our previous work on race detection for shared memory Java and C pro- grams and it handles programs written using shared memory approaches as well as bulk communication. Active testing works in two phases: in the first phase, it performs an im- precise dynamic analysis of an execution of the program and finds potential data races that could happen if the program is executed with a different thread schedule. In the second phase, active testing re-executes the program by actively controlling the thread schedule so that the data races re- ported in the first phase can be confirmed. A key highlight of our technique is that it can scalably handle distributed programs with bulk communication and single- and split- phase barriers. Another key feature of our technique is that it is precise-a data race confirmed by active testing is an actual data race present in the program; however, being a testing approach, our technique can miss actual data races. We implement the framework for the UPC programming language and demonstrate scalability up to a thousand cores for programs with both fine-grained and bulk (MPI style) communication. The tool confirms previously known bugs and uncovers several unknown ones. Our extensions capture constructs proposed in several modern programming languages for High Performance Computing, most notably non-blocking barriers and collectives.
Keywords :
C language; Java; distributed shared memory systems; message passing; parallel programming; C programs; Java programs; MPI style communication; UPC programming language; active testing; data race detection; distributed memory parallel programs; high performance computing; shared memory; Computer bugs; Instruction sets; Message systems; Runtime; Scalability; Synchronization; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Conference_Location :
Seatle, WA
Electronic_ISBN :
978-1-4503-0771-0
Type :
conf
Filename :
6114469
Link To Document :
بازگشت