مرکز منطقه ای اطلاع رساني علوم و فناوري - Approximate value iteration and temporal-difference learning

DocumentCode :

2606756

Title :

Approximate value iteration and temporal-difference learning

Author :

De Farias, Daniela Pucci ; Van Roy, Benjamin

Author_Institution :

Stanford Univ., CA, USA

fYear :

2000

fDate :

2000

Firstpage :

Lastpage :

Abstract :

In principle, a wide variety of sequential decision problems-ranging from dynamic resource allocation in communication networks to inter-temporal investment-can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Unfortunately, due to the curse of dimensionality, the associated computational requirements become intractable in most practical contexts. The paper discusses approximate valve iteration, an algorithm that tries to alleviate the curse of dimensionality. We present potential problems associated to it and special cases in which desirable results are obtained. We also discuss the relationship between approximate value iteration and temporal-difference learning

Keywords :

Markov processes; approximation theory; decision theory; iterative methods; learning (artificial intelligence); optimal control; stochastic systems; approximate value iteration; sequential decision problems; temporal-difference learning; Approximation methods; Communication networks; Communication system control; Context; Dynamic programming; Functional programming; Heuristic algorithms; Investments; State-space methods; USA Councils;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000

Conference_Location :

Lake Louise, Alta.

Print_ISBN :

0-7803-5800-7

Type :

conf

DOI :

10.1109/ASSPCC.2000.882445

Filename :

882445

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2606756