Abstract :
The core of proactive system management is the exploitation of the built-in intelligence of computer infrastructures in order to implement self-∗ properties for the assurance of a guaranteed quality of service even in the case of faults by reacting to them prior they affect services. The creation of a proper system control policy and implementation is always highly challenging. Modern private and public infrastructures are extremely large; accordingly, fault recovery relies to an increasing extent on rough granular reconfiguration. This way, traditional, discrete representation based fine granular fault handling mechanisms need to be complemented with efficient system level policies using continuous quantitative system models. A continuous observation-learning-policy improvement process is needed to build up and maintain efficient control policies. The most efficient way is the granular refinement of an initial control policy by observing the system behavior under an initial control policy. Subsequently, the improvement of the quality of the control is based on the processing of the basic information collected in the form of logs. Note, that this roundtrip is the only way to cope with the typically rapidly evolving and changing application environments. The presentation will provide an overview on the relation of data acquisition, log processing, modern signal processing and artificial intelligence for knowledge extraction needed to a proper, empirical model based system monitoring and control policy. The basic approaches and tools will be illustrated by a complete example workflow motivated by the recent industry sponsored research projects at BME on large scale infrastructure control.