Author_Institution :
India Software Lab., IBM India Private Ltd., Bangalore, India
Abstract :
With current trends in software industry toward increased complexity of modern software, tight integration of multiple software products, emphasis on software reliability and high-level availability, software support and maintenance costs increase dramatically. It is imperative for businesses to be able to monitor health of their systems making sure that they are performing at top levels, quickly respond to any problems and timely fix them and also be able to perform advanced problem determination to reduce total time for outages that already occurred. Equally important is to prevent problems from occurring based on best practices and knowledge of known problems/issues for specific software products. To achieve these goals, a powerful analysis engine capable of performing comprehensive health checks of customer systems and advanced problem determination based on analysis of customers´ data is proposed. It can be used for both proactive and reactive customer support. Such an engine works as a virtual consultant for the end users. It detects potential problems related to customer systems and installed products and provides notifications or alerts proactively, i.e. could be considered as an early detection system. It is also capable of analyzing FFDC (First Failure Data Capture) data after a problem has occurred, comparing the data with well known problems and related symptoms from relevant knowledge databases and providing customers with the results of analysis, found matches of previously recorded problems and recommendations on how to fix the problem at hand. The engine proposed utilizes up to date analytics from subject matter experts and best practices encoded in it. In the present work, a system architecture and design of such an analysis engine is presented. The proposed engine has a low bar of adoption, flexible extensible design and could be easily adopted for any software product. It is able to analyze encoded human knowledge, compare collected customer dat- with available historical data and report problems and issues found along with the relevant recommendations and suggested fixes. More specifically, the engine provides a comprehensive analysis in terms of health checks, best practices compliance check, prerequisites check, end-of-service product check, operating environment and configuration setup check, outage prevention, state comparison, problem determination and others. A case study based on the proposed engine design is presented and discussed in more detail.
Keywords :
DP industry; data analysis; software maintenance; software reliability; FFDC data analysis; advanced problem determination; analysis engine; compliance check; configuration setup check; customer data analysis; customer systems automated health checks; date analytics; early problem detection; end-of-service product check; first failure data capture; high-level availability; operating environment check; outage prevention; prerequisites check; proactive customer support; reactive customer support; software industry; software maintenance costs; software products; software reliability; software support; state comparison; virtual consultant; Best practices; Computer architecture; Databases; Engines; Maintenance engineering; Servers; Software; Analysis engine; Automated Health Check; proactive support;