Title :
Tarantula: a vector extension to the alpha architecture
Author :
Espasa, Roger ; Ardanaz, Federico ; Emer, Joel ; Felix, Sarah ; Gago, Julio ; Gramunt, Roger ; Hernandez, Isaac ; Juan, Toni ; Lowney, Geoff ; Mattina, Matthew ; Seznec, André
Author_Institution :
Compaq-UPC Microprocessor Lab., Univ. Politecnica de Catalunya, Barcelona, Spain
fDate :
6/24/1905 12:00:00 AM
Abstract :
Tarantula is an aggressive floating point machine targeted at technical, scientific and bioinformatics workloads, originally planned as a follow-on candidate to the EV8 processor. Tarantula adds to the EV8 core a vector unit capable of 32 double-precision flops per cycle. The vector unit fetches data directly from a 16 MByte second level cache with a peak bandwidth of sixty four 64-bit values per cycle. The whole chip is backed by a memory controller capable of delivering over 64 GBytes/s of raw bandwidth. Tarantula extends the Alpha ISA with new vector instructions that operate on new architectural state. Salient features of the architecture and implementation are: (1) it fully integrates into a virtual-memory cache-coherent system without changes to its coherency protocol, (2) provides high bandwidth for non-unit stride memory accesses, (3) supports gather/scatter instructions efficiently, (4) fully integrates with the EV8 core with a narrow, streamlined interface, rather than acting as a co-processor (5) can achieve a peak of 104 operations per cycle, and (6) achieves excellent "real-computation" per transistor and per watt ratios. Our detailed simulations show that Tarantula achieves an average speedup of 5X over EV8, out of a peak speedup in terms of flops of 8X. Furthermore, performance on gather/scatter intensive benchmarks such as Radix Sort is also remarkable: a speedup of almost 3X over EV8 and 15 sustained operations per cycle. Several benchmarks exceed 20 operations per cycle
Keywords :
performance evaluation; protocols; storage management; vector processor systems; Alpha architecture; EV8 core; Tarantula; bioinformatics workloads; coherency protocol; floating point machine; memory controller; radix sort; vector extension; virtual-memory cache-coherent system; Access protocols; Bandwidth; Bioinformatics; CMOS technology; Communication system control; Computer architecture; Coprocessors; Instruction sets; Microprocessors; Scattering;
Conference_Titel :
Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on
Conference_Location :
Anchorage, AK
Print_ISBN :
0-7695-1605-X
DOI :
10.1109/ISCA.2002.1003586