• DocumentCode
    2606756
  • Title

    Approximate value iteration and temporal-difference learning

  • Author

    De Farias, Daniela Pucci ; Van Roy, Benjamin

  • Author_Institution
    Stanford Univ., CA, USA
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    48
  • Lastpage
    51
  • Abstract
    In principle, a wide variety of sequential decision problems-ranging from dynamic resource allocation in communication networks to inter-temporal investment-can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Unfortunately, due to the curse of dimensionality, the associated computational requirements become intractable in most practical contexts. The paper discusses approximate valve iteration, an algorithm that tries to alleviate the curse of dimensionality. We present potential problems associated to it and special cases in which desirable results are obtained. We also discuss the relationship between approximate value iteration and temporal-difference learning
  • Keywords
    Markov processes; approximation theory; decision theory; iterative methods; learning (artificial intelligence); optimal control; stochastic systems; approximate value iteration; sequential decision problems; temporal-difference learning; Approximation methods; Communication networks; Communication system control; Context; Dynamic programming; Functional programming; Heuristic algorithms; Investments; State-space methods; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000
  • Conference_Location
    Lake Louise, Alta.
  • Print_ISBN
    0-7803-5800-7
  • Type

    conf

  • DOI
    10.1109/ASSPCC.2000.882445
  • Filename
    882445