Title :
PAC learning for Markov decision processes and dynamic games
Author :
Jain, Rahul ; Varaiya, Pravin P.
Author_Institution :
EECS Dept., California Univ., Berkeley, CA, USA
fDate :
27 June-2 July 2004
Abstract :
We extend the probably approximately correct (PAC) model of learning to Markov decision processes (MDPs) and dynamic games. We obtain simulation-based uniform sample complexity bounds for value function estimates of discounted reward MDPs. We also obtain uniform sample complexity results for Markov games with a finite number of players.
Keywords :
Markov processes; decision theory; game theory; Markov decision process; Markov game; dynamic game; function estimation; probably approximately correct learning; sample complexity bound; Contracts; Convergence; Markov processes; Noise generators; Space stations; State-space methods; Stochastic processes;
Conference_Titel :
Information Theory, 2004. ISIT 2004. Proceedings. International Symposium on
Print_ISBN :
0-7803-8280-3
DOI :
10.1109/ISIT.2004.1365505