• DocumentCode
    189159
  • Title

    B^2RTDP: An Efficient Solution for Bounded-Parameter Markov Decision Process

  • Author

    Fussuma, Fernando L. ; Valdivia Delgado, Karina ; Nunes de Barros, Leliane

  • Author_Institution
    Sch. of Arts, Sci. & Humanities, Univ. of Sao Paulo, Sao Paulo, Brazil
  • fYear
    2014
  • fDate
    18-22 Oct. 2014
  • Firstpage
    128
  • Lastpage
    133
  • Abstract
    Bounded-parameter Markov decision process (BMDP) can be used to model sequential decision problems, where the transitions probabilities are not completely know and are given by intervals. One of the criteria used to solve that kind of problems is the maxim in, i.e., the best action on the worst scenario. The algorithms to solve BMDPs that use this approach include interval value iteration and an extension of real time dynamic programming (Robust-LRTDP). In this paper, we introduce a new algorithm, named B2RTDP, also based on real time dynamic programming that makes a different choice of the next state to be visited using upper and lower bounds of the optimal value function. The empirical evaluation of the algorithm shows that it converges faster than the state-of-the-art algorithms that solve BMDPs.
  • Keywords
    Markov processes; dynamic programming; sequential estimation; B2RTDP; BMDP; bounded-parameter Markov decision process; interval value iteration; optimal value function; real time dynamic programming; robust-LRTDP; sequential decision problem; transitions probability; Convergence; Dynamic programming; Equations; Heuristic algorithms; Markov processes; Real-time systems; Robustness; Dynamic Programming; Markov Decision Process; Probabilistic Planning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (BRACIS), 2014 Brazilian Conference on
  • Conference_Location
    Sao Paulo
  • Type

    conf

  • DOI
    10.1109/BRACIS.2014.33
  • Filename
    6984819