مرکز منطقه ای اطلاع رساني علوم و فناوري - Cooperative off-policy prediction of Markov decision processes in adaptive networks

DocumentCode :

1673619

Title :

Cooperative off-policy prediction of Markov decision processes in adaptive networks

Author :

Valcarcel Macua, S. ; Jianshu Chen ; Zazo, S. ; Sayed, Ali H.

Author_Institution :

Escuela Tec. Super. de Ing. de Telecomun., Univ. Politec. de Madrid, Madrid, Spain

fYear :

2013

Firstpage :

4539

Lastpage :

4543

Abstract :

We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.

Keywords :

Markov processes; cooperative systems; decision theory; learning (artificial intelligence); mean square error methods; Markov decision processes; adaptive networks; agents; behavior policy; cooperative off-policy prediction; cooperative reinforcement learning algorithm; diffusion strategies; mean-square-error performance analysis; performance improvement; Algorithm design and analysis; Eigenvalues and eigenfunctions; Learning (artificial intelligence); Linear approximation; Markov processes; Prediction algorithms; Vectors; adaptive networks; diffusion strategies; dynamic programming; gradient temporal difference; mean-square-error; reinforcement learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6638519

Filename :

6638519

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1673619