Title :
Joint Circuit-System Design Space Exploration of Multiplier Unit Structure for Energy-Efficient Vector Processors
Author :
Ivan Ratkovic;Oscar Palomar;Milan Stanic;Milovan Duric;Djordje Peic;Osman Unsal;Adrian Cristal;Mateo Valero
fDate :
7/1/2015 12:00:00 AM
Abstract :
Although touted as a power and energy-efficient solution for workloads that exhibit data-level parallelism, vector processors were not explored sufficiently from a low power perspective in the past. Therefore, there is a need for explorations of vector computational units from a low power angle. Multimedia workloads that are suitable for vector processing (such as image processing) typically have the multiplication as a fundamental operation. In this paper, we perform a joint circuit-architecture design space exploration of the vector multiplier unit (VMU). For this exploration, we use various circuit- and architecture-level parameters (e.g. Multiplier family and maximum vector length), tools and simulators for a 40nm low power technology and the San Diego Vision Benchmark suite. We examine advantages and side effects of using multiple vector lanes and show how it performs across the frequency spectrum to achieve an energy-and thermal-efficient speed-up. As the final results of our exploration, we derive Pareto optimal VMU design points. Among other findings, our exploration reveals that Wallace VMU with 4 vector lanes and 2 pipeline stages is an optimal choice for fast and low power mobile vector processors, while single lane Carry-Save Array VMU is efficient for very low power and frequency requirements.
Keywords :
"Vector processors","Pipeline processing","Timing","Power dissipation","Space exploration","Benchmark testing"
Conference_Titel :
VLSI (ISVLSI), 2015 IEEE Computer Society Annual Symposium on
DOI :
10.1109/ISVLSI.2015.23