DocumentCode :
1050996
Title :
Recursive Learning Automata Approach to Markov Decision Processes
Author :
Chang, Hyeong Soo ; Fu, Michael C. ; Hu, Jiaqiao ; Marcus, Steven I.
Author_Institution :
Sogang Univ., Seoul
Volume :
52
Issue :
7
fYear :
2007
fDate :
7/1/2007 12:00:00 AM
Firstpage :
1349
Lastpage :
1355
Abstract :
In this note, we present a sampling algorithm, called recursive automata sampling algorithm (RASA), for control of finite-horizon Markov decision processes (MDPs). By extending in a recursive manner Sastry´s learning automata pursuit algorithm designed for solving nonsequential stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the pursuit algorithm by Rajaraman and Sastry, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: 1) a lower bound on the probability that RASA will sample the optimal action and 2) an upper bound on the probability that the deviation between the true optimal value and the RASA estimate exceeds a given error.
Keywords :
Markov processes; decision theory; finite automata; learning automata; probability; sampling methods; Markov decision process; finite-time analysis; finite-time behavior; learning automata pursuit algorithm; nonsequential stochastic optimization; probability bounds; recursive automata sampling algorithm; recursive learning automata; Algorithm design and analysis; Automatic control; Design optimization; Learning automata; Pursuit algorithms; Recursive estimation; Sampling methods; State estimation; Stochastic processes; Upper bound; Learning automata; Markov decision process (MDP); sampling;
fLanguage :
English
Journal_Title :
Automatic Control, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9286
Type :
jour
DOI :
10.1109/TAC.2007.900859
Filename :
4268369
Link To Document :
بازگشت