DocumentCode :
3706760
Title :
Communication Avoiding Power Scaling
Author :
John Leidel;Yong Chen
Author_Institution :
Whitacre Coll. of Eng., Texas Tech Univ., Lubbock, TX, USA
fYear :
2015
Firstpage :
166
Lastpage :
174
Abstract :
Recent system on chip (SoC) techniques have permitted the continued scaling of core densities at a rate sufficient to track Moore´s Law. However, this continued increase in transistor density has warranted new hardware features in order to sufficiently scale the degree of on-chip concurrency. Features such as complex multi-level caches, hierarchical core configurations and hardware-assisted threading have increased the overall energy requirements of the SoC and decreased the programmer´s ability to realize efficient scaling. This increase in overall system power requirements has resulted in research and development activities associated with hardware techniques such as dynamic frequency scaling and software techniques such as power-aware, fine-grained thread scheduling algorithms. We present the basis for a third area of research: power-scaling algorithmic complexity. The goal of this research focus is to describe techniques by which one may weigh the timing and power derivatives of competitive parallel algorithms in order to provide data necessary to make algorithmic choices based upon both the projected performance and the expected power requirements. This work presents a model and associated technique to describe the relative energy performance scaling characteristics of parallel and mixed parallel-sequential algorithms. The model and equations are then applied to a study of matrix multiplication techniques on a symmetric multiprocessing platform. We utilize a tuned Open BLAS blocking matrix multiplication, a classic parallel Strassen-Winograd technique and a Communication Avoiding Parallel Strassen (CAPS) technique to elicit the relative energy performance scaling on our aforementioned platform. In doing so, we show that while a blocking matrix multiplication may provide the highest potential performance on our platform, both the Strassen and CAPS techniques have ideal energy scaling properties. Furthermore, we show that by reducing the communication requirements of Strassen multiplication, we have the ability to gain a slight improvement in power scaling over traditional Strassen implementations.
Keywords :
"Power measurement","Heuristic algorithms","Mathematical model","System-on-chip","Computer architecture","Monitoring","Hardware"
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops (ICPPW), 2015 44th International Conference on
ISSN :
1530-2016
Type :
conf
DOI :
10.1109/ICPPW.2015.26
Filename :
7349908
Link To Document :
بازگشت