DocumentCode :
1705561
Title :
Decoupled vector architectures
Author :
Espasa, Roger ; Valero, Mateo
Author_Institution :
Dept. d´´Arquitectura de Computadors, Univ. Politecnica de Catalunya, Barcelona, Spain
fYear :
1996
Firstpage :
281
Lastpage :
290
Abstract :
The purpose of this paper is to show that using decoupling techniques in a vector processor, the performance of vector programs can be greatly improved. Using a trace driven approach, we simulate a selection of the Perfect Club programs and compare their execution time on a conventional vector architecture and on a decoupled vector architecture. Decoupling provides a performance advantage of more than a factor of two for realistic memory latencies, and even with an ideal memory system with no latency, there is still a speedup of as much as 50%. A bypassing technique between the load/store queues is introduced and we show how it can give up to an extra speedup of 22% while also reducing total memory traffic by an average of 20%. An important part of this paper is devoted to study the tradeoffs involved in choosing an adequate size for the different queues of the architecture, so that the hardware cost of the queues can be minimized while still retaining most of the performance advantages of decoupling
Keywords :
performance evaluation; vector processor systems; Perfect Club programs; bypassing technique; decoupled vector architectures; hardware cost; performance; performance advantages; realistic memory latencies; total memory traffic; trace driven approach; vector processor; Computational modeling; Computer aided instruction; Computer architecture; Costs; Delay; Hardware; Multithreading; Parallel processing; Vector processors; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Performance Computer Architecture, 1996. Proceedings., Second International Symposium on
Conference_Location :
San Jose, CA
Print_ISBN :
0-8186-7237-4
Type :
conf
DOI :
10.1109/HPCA.1996.501193
Filename :
501193
Link To Document :
بازگشت