DocumentCode
167549
Title
Managing Soft-Errors in Transactional Systems
Author
Mohamedin, Mohamed ; Palmieri, Roberto ; Ravindran, Binoy
Author_Institution
Electr. & Comput. Eng. Dept., Virginia Tech, Blacksburg, VA, USA
fYear
2014
fDate
19-23 May 2014
Firstpage
1324
Lastpage
1329
Abstract
Multicore architectures are becoming increasingly prone to soft-errors - i.e., transient faults caused by external physical phenomena such as electric noise and cosmic particle strikes. With increasing core counts, the soft-error rate is growing due to the accelerating transistor density on chips. The impact of these errors on business-critical applications that are being deployed on multicore hardware can be significant. We present an active replication-based approach that fully masks such errors for transactional applications. We partition computational cores, fully replicate objects across partitions, and concurrently execute transactional requests on all partitions, thereby enabling completely local object accesses. Transactional requests are globally ordered and delivered across partitions using optimistic atomic broadcast. Hardware message passing -- an important emerging trend in multicore architectures -- is exploited to mitigate communication costs. We report preliminary results obtained with an implementation of our approach on a 36-core Tilera TILE-Gx hardware, with an on-chip scalable mesh network.
Keywords
computer architecture; concurrency control; multiprocessing systems; radiation hardening (electronics); Tilera TILE-Gx hardware; active replication-based approach; business-critical applications; communication cost mitigation; computational core partitioning; concurrent transactional request execution; core counts; cosmic particle; electric noise; error masking; external physical phenomena; globally delivered transactional requests; globally ordered transactional requests; hardware message passing; local object access; multicore architectures; multicore hardware; object replication; on-chip scalable mesh network; optimistic atomic broadcast; soft-error management; soft-error rate; transactional applications; transactional systems; transient faults; transistor density; Concurrency control; Hardware; Message systems; Multicore processing; Protocols; Throughput; Active Replication; Soft Errors; Transaction Processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location
Phoenix, AZ
Print_ISBN
978-1-4799-4117-9
Type
conf
DOI
10.1109/IPDPSW.2014.148
Filename
6969532
Link To Document