Title :
Constrained reinforcement learning from intrinsic and extrinsic rewards
Author :
Uchibe, Eiji ; Doya, Kenji
Author_Institution :
Okinawa Inst. of Sci. & Technol., Okinawa
Abstract :
The main objective of a standard reinforcement learner is usually defined as maximization of a scalar reward function given externally from the environment. On the other hand, an intrinsically motivated reinforcement learner creates an intrinsic reward function from its own criteria such as curiosity, prediction error, and learning progress. This paper proposes a novel approach to deal with both intrinsic and extrinsic rewards for reinforcement learning from a viewpoint of constrained optimization problem. The extrinsic rewards construct inequality constraints to the stochastic policy while the intrinsic reward determines the current objective function for the learning agent. By integrating policy gradient reinforcement learning algorithms and techniques used in nonlinear programming, our proposed method, named the constrained policy gradient reinforcement learning (CPGRL), maximizes the long-term average intrinsic reward under the inequality constraints induced by the extrinsic rewards. The CPGRL is successfully applied to a simple MDP problem and a control task of a robot arm.
Keywords :
Markov processes; constraint handling; decision making; learning (artificial intelligence); manipulator dynamics; multi-agent systems; nonlinear programming; CPGRL method; MDP problem; Markov decision process; constrained optimization problem; constrained policy gradient reinforcement learning; curiosity; extrinsic rewards; inequality constraints; intrinsic rewards; learning agent; learning progress; nonlinear programming; prediction error; robot arm control task; scalar reward function maximization; stochastic policy; Algorithm design and analysis; Cities and towns; Constraint optimization; Learning; Linear programming; Orbital robotics; Predictive models; Robot programming; Stochastic processes; Intrinsic and extrinsic rewards; reinforcement learning;
Conference_Titel :
Development and Learning, 2007. ICDL 2007. IEEE 6th International Conference on
Conference_Location :
London
Print_ISBN :
978-1-4244-1116-0
Electronic_ISBN :
978-1-4244-1116-0
DOI :
10.1109/DEVLRN.2007.4354030