Title :
Workload-Cognizant Concurrent Error Detection in the Scheduler of a Modern Microprocessor
Author :
Karimi, Naghmeh ; Maniatakos, Michail ; Jas, Abhijit ; Tirumurti, Chandra ; Makris, Yiorgos
Author_Institution :
Dept. of Electr. Eng., Duke Univ., Durham, NC, USA
Abstract :
We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.
Keywords :
computer architecture; dynamic scheduling; error detection; fault tolerant computing; microprocessor chips; processor scheduling; Alpha-like superscalar microprocessor; application-level error masking; dynamic scheduling; hybrid branch prediction; instruction execution; microprocessor architectural state; modern microprocessor scheduling; out-of-order execution capabilities; program execution; workload-cognizant concurrent error detection; Circuit faults; Clocks; Hardware; Hardware design languages; Hazards; Microprocessors; Registers; Concurrent error detection; invariance.; microprocessor; scheduler;
Journal_Title :
Computers, IEEE Transactions on
DOI :
10.1109/TC.2010.265