Title : 
Reinforcement learning with supervision by combining multiple learnings and expert advices
         
        
            Author : 
Chang, Hyeong Soo
         
        
            Author_Institution : 
Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul
         
        
        
        
            Abstract : 
In this paper, we provide a formal coherent learning framework where reinforcement learning is combined with multiple learnings and expert advices toward accelerating convergence speed of learning. Our approach is simply to use a nonstationary "potential-based reinforcement function" for shaping the reinforcement signal given to the learning "base-agent". The base-agent employes SARSA(O) or adaptive asynchronous value iteration (VI), and the supervised inputs to the base-agent from the "subagents" involved with other parallel independent reinforcement learnings and if available, from experts are "merged" into the potential-based reinforcement function value and the value is put into the update equation of SARSA(O) for the Q-function estimate or of adaptive asynchronous VI for the optimal value function estimate. The resulting SARSA(O) and adaptive asynchronous VI converge to an optimal policy, respectively
         
        
            Keywords : 
learning (artificial intelligence); software agents; Q-function estimate; adaptive asynchronous value iteration; expert advices; learning base-agent; multiple learnings; optimal value function estimate; parallel independent reinforcement learning; potential-based reinforcement function; reinforcement signal; Acceleration; Computer science; Convergence; Decision making; Equations; Intelligent agent; Intelligent robots; Learning; Linear programming; Statistics;
         
        
        
        
            Conference_Titel : 
American Control Conference, 2006
         
        
            Conference_Location : 
Minneapolis, MN
         
        
            Print_ISBN : 
1-4244-0209-3
         
        
            Electronic_ISBN : 
1-4244-0209-3
         
        
        
            DOI : 
10.1109/ACC.2006.1657371