Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
Abstract :
Service availability and QoS, in terms of customer affecting performance metrics, is crucial for service systems. However, the increasing complexity in distributed service systems introduce hidden space for software faults, which undermine system availability, leading to fault or even down time. In this paper, we introduce a composition technique, Coordinated Selective Rejuvenation, to automate the whole procession of fault component identification and rejuvenation arbitration, in order to guarantee distributed service system´s customer-affecting metrics. We take evaluation with fault injection experiment on RUBiS, which simulates distributed eCommerce of eBay.com. The results indicate that our request path analysis approach and system model technique are effective for fault component´s location, Bayesian network technique is feasible for fault pinpointing, in terms of request tracing context. Meanwhile, the arbitration scheme, can effectively guarantee system QoS, by identifying and rejuvenating most likely performance fault tier, before the degradation of customer affecting performance metric become severe.
Keywords :
Web services; customer relationship management; electronic commerce; fault tolerant computing; quality of service; software performance evaluation; Bayesian network; QoS; coordinated selective rejuvenation; customer-affecting metrics; distributed eCommerce; distributed service system; eBay.com; fault component identification; fault component location; fault injection experiment; fault pinpointing; path analysis; performance metrics; quality of service; rejuvenation arbitration; request tracing context; service availability; software fault; system availability;