DocumentCode
775970
Title
Partially Observable Markov Decision Processes With Reward Information: Basic Ideas and Models
Author
Cao, Xi-Ren ; Guo, Xianping
Author_Institution
Hong Kong Univ. of Sci. & Technol.
Volume
52
Issue
4
fYear
2007
fDate
4/1/2007 12:00:00 AM
Firstpage
677
Lastpage
681
Abstract
In a partially observable Markov decision process (POMDP), if the reward can be observed at each step, then the observed reward history contains information on the unknown state. This information, in addition to the information contained in the observation history, can be used to update the state probability distribution. The policy thus obtained is called a reward-information policy (RI-policy); an optimal RI-policy performs no worse than any normal optimal policy depending only on the observation history. The above observation leads to four different problem-formulations for POMDPs depending on whether the reward function is known and whether the reward at each step is observable. This exploratory work may attract attention to these interesting problems
Keywords
Markov processes; linear quadratic control; stochastic systems; partially observable Markov decision processes; reward-information policy; state probability distribution; Cost function; History; Probability distribution; State estimation; State-space methods; Uncertainty; User-generated content; Partially observable Markov decision process (POMDP); reward-information policy;
fLanguage
English
Journal_Title
Automatic Control, IEEE Transactions on
Publisher
ieee
ISSN
0018-9286
Type
jour
DOI
10.1109/TAC.2007.894520
Filename
4154961
Link To Document