DocumentCode :
2136440
Title :
Dual-core execution: building a highly scalable single-thread instruction window
Author :
Zhou, Huiyang
Author_Institution :
Sch. of Comput. Sci., Central Florida Univ., Orlando, FL, USA
fYear :
2005
fDate :
17-21 Sept. 2005
Firstpage :
231
Lastpage :
242
Abstract :
Current integration trends embrace the prosperity of single-chip multi-core processors. Although multi-core processors deliver significantly improved system throughput, single-thread performance is not addressed. In this paper, we propose a new execution paradigm that utilizes multi-cores on a single chip collaboratively to achieve high performance for single-thread memory-intensive workloads while maintaining the flexibility to support multithreaded applications. The proposed execution paradigm, dual-core execution, consists of two superscalar cores (a front and back processor) coupled with a queue. The front processor fetches and preprocesses instruction streams and retires processed instructions into the queue for the back processor to consume. The front processor executes instructions as usual except for cache-missing loads, which produce an invalid value instead of blocking the pipeline. As a result, the front processor runs far ahead to warm up the data caches and fix branch mispredictions for the back processor. In-flight instructions are distributed in the front processor, the queue, and the back processor, forming a very large instruction window for single-thread out-of-order execution. The proposed architecture incurs only minor hardware changes and does not require any large centralized structures such as large register files, issue queues, load/store queues, or reorder buffers. Experimental results show remarkable latency hiding capabilities of the proposed architecture, even outperforming more complex single-thread processors with much larger instruction windows than the front or back processor.
Keywords :
cache storage; multi-threading; multiprocessing systems; dual-core execution; highly scalable single-thread instruction window; issue queues; load-store queues; multithreaded applications; register files; reorder buffers; single-chip multicore processors; single-thread memory-intensive workloads; single-thread out-of-order execution; Buffer storage; Collaborative work; Delay; Hardware; Multicore processing; Out of order; Pipelines; Registers; Throughput; Windows;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on
ISSN :
1089-795X
Print_ISBN :
0-7695-2429-X
Type :
conf
DOI :
10.1109/PACT.2005.18
Filename :
1515596
Link To Document :
بازگشت