مرکز منطقه ای اطلاع رساني علوم و فناوري - Constrained reinforcement learning from intrinsic and extrinsic rewards

DocumentCode :

1861684

Title :

Constrained reinforcement learning from intrinsic and extrinsic rewards

Author :

Uchibe, Eiji ; Doya, Kenji

Author_Institution :

Okinawa Inst. of Sci. & Technol., Okinawa

fYear :

2007

fDate :

11-13 July 2007

Firstpage :

163

Lastpage :

168

Abstract :

The main objective of a standard reinforcement learner is usually defined as maximization of a scalar reward function given externally from the environment. On the other hand, an intrinsically motivated reinforcement learner creates an intrinsic reward function from its own criteria such as curiosity, prediction error, and learning progress. This paper proposes a novel approach to deal with both intrinsic and extrinsic rewards for reinforcement learning from a viewpoint of constrained optimization problem. The extrinsic rewards construct inequality constraints to the stochastic policy while the intrinsic reward determines the current objective function for the learning agent. By integrating policy gradient reinforcement learning algorithms and techniques used in nonlinear programming, our proposed method, named the constrained policy gradient reinforcement learning (CPGRL), maximizes the long-term average intrinsic reward under the inequality constraints induced by the extrinsic rewards. The CPGRL is successfully applied to a simple MDP problem and a control task of a robot arm.

Keywords :

Markov processes; constraint handling; decision making; learning (artificial intelligence); manipulator dynamics; multi-agent systems; nonlinear programming; CPGRL method; MDP problem; Markov decision process; constrained optimization problem; constrained policy gradient reinforcement learning; curiosity; extrinsic rewards; inequality constraints; intrinsic rewards; learning agent; learning progress; nonlinear programming; prediction error; robot arm control task; scalar reward function maximization; stochastic policy; Algorithm design and analysis; Cities and towns; Constraint optimization; Learning; Linear programming; Orbital robotics; Predictive models; Robot programming; Stochastic processes; Intrinsic and extrinsic rewards; reinforcement learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Development and Learning, 2007. ICDL 2007. IEEE 6th International Conference on

Conference_Location :

London

Print_ISBN :

978-1-4244-1116-0

Electronic_ISBN :

978-1-4244-1116-0

Type :

conf

DOI :

10.1109/DEVLRN.2007.4354030

Filename :

4354030

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1861684