Title :
The design and use of persistent memory on the DNCP hardware fault-tolerant platform
Author :
Bressoud, Thomas C. ; Clark, Tom ; Kan, Ti
Author_Institution :
Bell Labs., Lucent Technol., Murray Hill, NJ, USA
Abstract :
Systems that are designed to recover from system failure due to software faults of the operating system and/or application typically require a means of persistently storing a subset of the state of the application. Disk drives are most often used as this persistent storage, but at a performance cost incurred repeatedly during normal execution as well as again at recovery time. Academic work has pioneered the concept of using a region of conventional memory, protecting it, and making it persist across operating system crashes and reboots, and making it as reliable as a disk. This can be used in place of a disk to alleviate the performance penalties noted above. This paper describes a project to take these concepts and apply them in a RAM disk-based realization of persistent memory (PM) as part of the Lucent DNCP (Distributed Network Control Platform) hardware fault-tolerant platform and implemented for the HP-UX operating system, focusing on its use by a main-memory database (MMDB) system. While we found that the reduction in recovery time was small relative to the reboot time, we achieved a nearly 40% reduction in execution time for an MMDB benchmark run on the PM as opposed to its normal use of a disk for achieving recoverability.
Keywords :
Unix; database management systems; fault tolerant computing; performance evaluation; persistent objects; random-access storage; system recovery; Distributed Network Control Platform; HP-UX operating system; Lucent DNCP hardware fault-tolerant platform; RAM disk; Unix; application software faults; disk drives; execution time; main-memory database system; operating system crashes; operating system software faults; performability; performance cost; performance penalties; persistent memory; protected memory region; reboot time; recoverability; recovery time; reliability; stare subset storage; system failure recovery; Application software; Computer crashes; Costs; Disk drives; Fault tolerance; Hardware; Operating systems; Protection; Random access memory; Read-write memory;
Conference_Titel :
Dependable Systems and Networks, 2001. DSN 2001. International Conference on
Conference_Location :
Goteborg, Sweden
Print_ISBN :
0-7695-1101-5
DOI :
10.1109/DSN.2001.941433