Title : 
Learning an Optimal Control Policy for a Markov Decision Process Under Linear Temporal Logic Specifications
         
        
            Author : 
Masaki Hiromoto;Toshimitsu Ushio
         
        
        
        
        
            Abstract : 
In this paper, We consider an uncertain Markov decision process (MDP) with a control cost and a linear temporal logic (LTL) control specification. We propose a reinforcement learning (RL) based method for design of an optimal control policy by which the controlled MDP satisfies the control specification with probability 1 and minimizes an expected discounted sum of the control costs. First, we construct a deterministic Rabin automaton (DRA) that accepts all and only infinite words satisfying the LTL control specification. Second, we construct a product MDP of the MDP and the DRA to represent a dynamic control policy that satisfies the control specification. Third, we modify the product MDP in order to apply RL to the design of an optimal control policy. The control action of the modified product MDP is a pair of a pattern and an action, where the pattern is a set of actions. Moreover, we introduce a reward that represents both the satisfaction of the control specification and the minimally restrictiveness of the pattern. Finally, we proposed an algorithm for design of an optimal control policy that consists of a sequential decision making of two steps. At the first decision making, we select a pattern that maximizes a discounted sum of the reward. At the second one, we select an action from the pattern selected at the first one such that it minimizes the expected discounted sum of the costs. Moreover, we consider an illustrative example to show that the proposed algorithm can obtain an optimal control policy.
         
        
            Keywords : 
"Optimal control","Markov processes","Process control","Decision making","Silicon","Bismuth","Safety"
         
        
        
            Conference_Titel : 
Computational Intelligence, 2015 IEEE Symposium Series on
         
        
            Print_ISBN : 
978-1-4799-7560-0
         
        
        
            DOI : 
10.1109/SSCI.2015.87