Title : 
Relative value iteration for average reward semi-Markov control via simulation
         
        
        
            Author_Institution : 
Dept. of Eng. Manage. & Syst. Eng., Missouri Univ. of Sci. & Technol., Rolla, MO, USA
         
        
        
        
        
        
            Abstract : 
This paper studies the semi-Markov decision process (SMDP) under the long-run average reward criterion in the simulation-based context. Using dynamic programming, a straightforward approach for solving this problem involves policy iteration; a value iteration approach for this problem involves a transformation that induces an additional computational burden. In the simulation-based context, however, where one seeks to avoid the transition probabilities needed in dynamic programming, value iteration forms a more convenient route for solution purposes. In this paper, hence, we present (to the best of knowledge for the first time) a relative value iteration algorithm for solving average reward SMDPs via simulation. The algorithm is a semi-Markov extension of an algorithm in the literature for the Markov decision process. Our numerical results with the new algorithm are very encouraging.
         
        
            Keywords : 
Markov processes; dynamic programming; iterative methods; probability; simulation; average reward SMDPs; average reward semiMarkov control; dynamic programming; long-run average reward criterion; policy iteration; relative value iteration algorithm; semiMarkov decision process; simulation-based context; transition probabilities; value iteration approach; Algorithm design and analysis; Context; Dynamic programming; Equations; Markov processes; Mathematical model; Modeling;
         
        
        
        
            Conference_Titel : 
Simulation Conference (WSC), 2013 Winter
         
        
            Conference_Location : 
Washington, DC
         
        
            Print_ISBN : 
978-1-4799-2077-8
         
        
        
            DOI : 
10.1109/WSC.2013.6721456