DocumentCode :
2958106
Title :
Efficient Resource Oblivious Algorithms for Multicores with False Sharing
Author :
Cole, Richard ; Ramachandran, Vijaya
Author_Institution :
Comput. Sci. Dept., NYU, New York, NY, USA
fYear :
2012
fDate :
21-25 May 2012
Firstpage :
201
Lastpage :
214
Abstract :
We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur. False sharing happens when two or more processors access the same block (i.e., cache-line) in parallel, and at least one processor writes into a location in the block. False sharing causes different processors to have inconsistent views of the data in the block, and many of the methods currently used to resolve these inconsistencies can cause large delays. We analyze the cost of false sharing both for variables stored on the execution stacks of the parallel tasks and for output variables. Our main technical contribution is to establish a low cost for this overhead for the class of multithreaded block-resilient HBP (Hierarchical Balanced Parallel) computations. Using this and other techniques, we develop block-resilient HBP algorithms with low false sharing costs for several fundamental problems including scans, matrix multiplication, FFT, sorting, and hybrid block-resilient HBP algorithms for list ranking and graph connected components. Most of these algorithms are derived from known multicore algorithms, but are further refined to achieve a low false sharing overhead. Our algorithms make no mention of machine parameters, and our analysis of the false sharing overhead is mostly in terms of the the number of tasks generated in parallel during the computation, and thus applies to a variety of schedulers.
Keywords :
cache storage; graph theory; matrix multiplication; multi-threading; multiprocessing systems; parallel algorithms; parallel architectures; resource allocation; FFT; block location; execution stacks; false sharing; graph connected components; hybrid block-resilient HBP algorithms; inconsistent data views; list ranking; machine parameters; matrix multiplication; multicore algorithms; multicore environment; multithreaded block-resilient HBP computations; multithreaded block-resilient hierarchical balanced parallel computations; parallel block; parallel tasks generation; private cache; resource oblivious algorithms; Algorithm design and analysis; Computational modeling; Delay; Instruction sets; Multicore processing; Upper bound; cache-efficiency; false-sharing; multicores;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location :
Shanghai
ISSN :
1530-2075
Print_ISBN :
978-1-4673-0975-2
Type :
conf
DOI :
10.1109/IPDPS.2012.28
Filename :
6267836
Link To Document :
بازگشت