DocumentCode :
2606756
Title :
Approximate value iteration and temporal-difference learning
Author :
De Farias, Daniela Pucci ; Van Roy, Benjamin
Author_Institution :
Stanford Univ., CA, USA
fYear :
2000
fDate :
2000
Firstpage :
48
Lastpage :
51
Abstract :
In principle, a wide variety of sequential decision problems-ranging from dynamic resource allocation in communication networks to inter-temporal investment-can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Unfortunately, due to the curse of dimensionality, the associated computational requirements become intractable in most practical contexts. The paper discusses approximate valve iteration, an algorithm that tries to alleviate the curse of dimensionality. We present potential problems associated to it and special cases in which desirable results are obtained. We also discuss the relationship between approximate value iteration and temporal-difference learning
Keywords :
Markov processes; approximation theory; decision theory; iterative methods; learning (artificial intelligence); optimal control; stochastic systems; approximate value iteration; sequential decision problems; temporal-difference learning; Approximation methods; Communication networks; Communication system control; Context; Dynamic programming; Functional programming; Heuristic algorithms; Investments; State-space methods; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000
Conference_Location :
Lake Louise, Alta.
Print_ISBN :
0-7803-5800-7
Type :
conf
DOI :
10.1109/ASSPCC.2000.882445
Filename :
882445
Link To Document :
بازگشت