DocumentCode
2606756
Title
Approximate value iteration and temporal-difference learning
Author
De Farias, Daniela Pucci ; Van Roy, Benjamin
Author_Institution
Stanford Univ., CA, USA
fYear
2000
fDate
2000
Firstpage
48
Lastpage
51
Abstract
In principle, a wide variety of sequential decision problems-ranging from dynamic resource allocation in communication networks to inter-temporal investment-can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Unfortunately, due to the curse of dimensionality, the associated computational requirements become intractable in most practical contexts. The paper discusses approximate valve iteration, an algorithm that tries to alleviate the curse of dimensionality. We present potential problems associated to it and special cases in which desirable results are obtained. We also discuss the relationship between approximate value iteration and temporal-difference learning
Keywords
Markov processes; approximation theory; decision theory; iterative methods; learning (artificial intelligence); optimal control; stochastic systems; approximate value iteration; sequential decision problems; temporal-difference learning; Approximation methods; Communication networks; Communication system control; Context; Dynamic programming; Functional programming; Heuristic algorithms; Investments; State-space methods; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000
Conference_Location
Lake Louise, Alta.
Print_ISBN
0-7803-5800-7
Type
conf
DOI
10.1109/ASSPCC.2000.882445
Filename
882445
Link To Document