Title :
State aggregation in Markov decision processes
Author :
Ren, Zhiyuan ; Krogh, Bruce H.
Author_Institution :
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
We study state aggregation for Markov decision processes (MDPs) with long-run average-cost optimality criterion in this paper. The aggregation is based on a definition of an (εp, εf)-lumpable partition of the state space, where the difference between the control effect of any control action on any two states belonging to the same subset in the partition is bounded by εp for the state-transition effect and εf for the cost effect. The states in the same partition subset are treated into one meta-state to obtain an aggregated Markov chain. We then construct an aggregated MDP with average cost on the aggregated Markov chain. We develop an algorithm to find the solution to this aggregated MDP problem and show its performance is within some o(εp, εf) neighborhood of the optimal solution to the original MDP problem. In real applications, the (εp, εf)-lumpable partition is usually obtained empirically. However, we also study the problem of looking for the coarsest (εp, εf)-lumpable partition, i.e., the partition with minimum number of subsets, given εp and εf. We prove that this partitioning problem is in the time complexity class of P-hard, which is not easier than the original MDP problem in the class of P-complete.
Keywords :
Markov processes; cost optimal control; decision theory; state-space methods; suboptimal control; Markov decision processes; aggregated Markov chain; average cost optimality criterion; control action; control effect; cost effect; meta state; optimal solution; partition subset; partitioning problem; state aggregation; state space lumpable partition; state transition effect; time complexity; Aggregates; Context modeling; Cost function; Large-scale systems; Partitioning algorithms; Performance analysis; State-space methods;
Conference_Titel :
Decision and Control, 2002, Proceedings of the 41st IEEE Conference on
Print_ISBN :
0-7803-7516-5
DOI :
10.1109/CDC.2002.1184960