Title :
A methodology for detection and estimation of software aging
Author :
Garg, Sachin ; Van Moorsel, Aad ; Vaidyanathan, Kalyanaraman ; Trivedi, Kishor S.
Author_Institution :
AT&T Bell Labs., Murray Hill, NJ, USA
Abstract :
The phenomenon of software aging refers to the accumulation of errors during the execution of the software which eventually results in it´s crash/hang failure. A gradual performance degradation may also accompany software aging. Pro-active fault management techniques such as “software rejuvenation” (Y. Huang et al., 1995) may be used to counteract aging if it exists. We propose a methodology for detection and estimation of aging in the UNIX operating system. First, we present the design and implementation of an SNMP based, distributed monitoring tool used to collect operating system resource usage and system activity data at regular intervals, from networked UNIX workstations. Statistical trend detection techniques are applied to this data to detect/validate the existence of aging. For quantifying the effect of aging in operating system resources, we propose a metric: “estimated time to exhaustion”, which is calculated using well known slope estimation techniques. Although the distributed data collection tool is specific to UNIX, the statistical techniques can be used for detection and estimation of aging in other software as well
Keywords :
Unix; software fault tolerance; software maintenance; system monitoring; SNMP based distributed monitoring tool; UNIX operating system; distributed data collection tool; error accumulation; estimated time to exhaustion; networked UNIX workstations; operating system resource usage; performance degradation; proactive fault management techniques; slope estimation techniques; software aging detection; software aging estimation; software rejuvenation; statistical techniques; statistical trend detection techniques; system activity data; Aging; Application software; Computer errors; Degradation; Electrical capacitance tomography; Hardware; Monitoring; Operating systems; Read only memory; Software safety;
Conference_Titel :
Software Reliability Engineering, 1998. Proceedings. The Ninth International Symposium on
Conference_Location :
Paderborn
Print_ISBN :
0-8186-8991-9
DOI :
10.1109/ISSRE.1998.730892