Title :
The complexity of Policy Iteration is exponential for discounted Markov Decision Processes
Author :
Hollanders, R. ; Delvenne, Jean-Charles ; Jungers, Raphael M.
Author_Institution :
Dept. of Math. Eng., UCLouvain, Louvain-la-Neuve, Belgium
Abstract :
The question of knowing whether the Policy Iteration algorithm (PI) for solving stationary Markov Decision Processes (MDPs) has exponential or (strongly) polynomial complexity has attracted much attention in the last 25 years. Recently, a family of examples on which PI requires an exponential number of iterations to converge was proposed for the total-reward and the average-reward criteria. On the other hand, it was shown that PI runs in strongly polynomial time on discounted-reward MDPs, yet only when the discount factor is fixed beforehand. In this work, we show that PI needs an exponential number of steps to converge on discounted-reward MDPs with a general discount factor.
Keywords :
Markov processes; computational complexity; iterative methods; average-reward criteria; discounted Markov decision processes; discounted-reward MDP; exponential complexity; exponential number; general discount factor; policy iteration algorithm; polynomial complexity; polynomial time; stationary Markov decision processes; Complexity theory; Linear systems; Markov processes; Optimization; Polynomials; Upper bound; Vectors;
Conference_Titel :
Decision and Control (CDC), 2012 IEEE 51st Annual Conference on
Conference_Location :
Maui, HI
Print_ISBN :
978-1-4673-2065-8
Electronic_ISBN :
0743-1546
DOI :
10.1109/CDC.2012.6426485