Incremental policy learning: an equilibrium selection algorithm for reinforcement learning agents with common interests

Author

Fulda, Nancy ; Ventura, Dan

Author_Institution

Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA

Volume

2

fYear

2004

fDate

25-29 July 2004

Firstpage

1121

Abstract

We present an equilibrium selection algorithm for reinforcement learning agents that incrementally adjusts the probability of executing each action based on the desirability of the outcome obtained in the last time step. The algorithm assumes that at least one coordination equilibrium exists and requires that the agents have a heuristic for determining whether or not the equilibrium was obtained. In deterministic environments with one or more strict coordination equilibria, the algorithm learns to play an optimal equilibrium as long as the heuristic is accurate. Empirical data demonstrate that the algorithm is also effective in stochastic environments and is able to learn good joint policies when the heuristic´s parameters are estimated during learning, rather than known in advance.

Keywords

learning (artificial intelligence); multi-agent systems; optimisation; probability; stochastic processes; equilibrium selection algorithm; incremental policy learning; optimal equilibrium; probability; reinforcement learning agents; stochastic environments; Computer science; Learning; Minimax techniques; Parameter estimation; Stochastic processes; Taxonomy;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on

ISSN

1098-7576

Print_ISBN

0-7803-8359-1

Type

conf

DOI

10.1109/IJCNN.2004.1380091

Filename

1380091