Title :
Reinforcement learning without an explicit terminal state
Author :
Riedmiller, Martin
Author_Institution :
Inst. fur Logik, Komplexitat und Deduktionssyst., Karlsruhe Univ., Germany
Abstract :
Introduces a reinforcement learning framework based on dynamic programming for a class of control problems, where no explicit terminal state exists. This situation especially occurs in the context of technical process control: the control task is not terminated once a predefined target value is reached, but instead the controller has to continue to control the system in order to avoid the system´s output drifting away from its target value again. We propose a set of assumptions and give a proof for the convergence of the value iteration method. From this a new algorithm, which we call the fixed horizon algorithm, is derived. The performance of the proposed algorithm is compared to an approach that assumes the existence of an explicit terminal state. The application to a cart/double pole-system finally shows the application to a difficult practical control task
Keywords :
convergence; dynamic programming; iterative methods; learning (artificial intelligence); neurocontrollers; position control; process control; self-adjusting systems; cart/double pole-system; dynamic programming; fixed horizon algorithm; reinforcement learning; technical process control; value iteration method; Chemical reactors; Control systems; Convergence; Cost function; Dynamic programming; Electronic mail; Learning; Optimal control; Process control; Temperature control;
Conference_Titel :
Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
0-7803-4859-1
DOI :
10.1109/IJCNN.1998.687166