Title :
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Author :
Choi, Seungryul ; Yeung, Donald
Author_Institution :
Dept. of Comput. Sci., Maryland Univ.
Abstract :
The key to high performance in simultaneous multithreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss counts, and take actions to try to alleviate them. While the corrective actions are designed to improve performance, their actual performance impact is not known since end performance is never monitored. Consequently, potential performance gains are lost whenever the corrective actions do not effectively address the actual bottlenecks occurring in the pipeline. We propose a different approach to SMT resource distribution that optimizes end performance directly. Our approach observes the impact that resource distribution decisions have on performance at runtime, and feeds this information back to the resource distribution mechanisms to improve future decisions. By evaluating many different resource distributions, our approach tries to learn the best distribution over time. Because we perform learning on-line, learning time is crucial. We develop a hill-climbing algorithm that efficiently learns the best distribution of resources by following the performance gradient within the resource distribution space. This paper conducts an in-depth investigation of learning-based SMT resource distribution. First, we compare existing resource distribution techniques to an ideal learning-based technique that performs learning off-line. This limit study shows learning-based techniques can provide up to 19.2% gain over ICOUNT, 18.0% gain over FLUSH, and 7.6% gain over DCRA across 21 multithreaded workloads. Then, we present an on-line learning algorithm based on hill-climbing. Our evaluation shows hill-climbing provides a 12.4% gain over ICOUNT, 11.3% gain over FLUSH, and 2.4% gain over DCRA across a larger set of 42 multiprogrammed workloads
Keywords :
learning (artificial intelligence); multi-threading; multiprocessing programs; multiprocessing systems; performance evaluation; pipeline processing; resource allocation; SMT processor resource distribution; active threads; cache miss counts; hill-climbing; instruction occupancy; multiprogrammed workloads; multithreaded workloads; online learning algorithm; performance gradient; shared resources; simultaneous multithreaded processors; Computer science; Hardware; Monitoring; Multithreading; Performance gain; Pipelines; Resource management; Runtime; Surface-mount technology; Yarn;
Conference_Titel :
Computer Architecture, 2006. ISCA '06. 33rd International Symposium on
Conference_Location :
Boston, MA
Print_ISBN :
0-7695-2608-X
DOI :
10.1109/ISCA.2006.25