Title : 
Self-modifying reinforcement learning
         
        
        
            Author_Institution : 
IST Res., Ningbo Univ., Zhejiang, China
         
        
        
        
        
        
            Abstract : 
We describe several experiments with reinforcement learning systems based on the technique of incremental self-improvement (IS). IS uses the success-story algorithm (SSA) to undo unrewarding policy changes computed by self-modifying policies. The experiment demonstrates IS´ advantages over stochastic hill climbing and TD Q-learning in noisy environments given limited computational resources.
         
        
            Keywords : 
learning (artificial intelligence); learning automata; stochastic automata; TD Q-learning; incremental self-improvement; noisy environments; self-modifying reinforcement learning; stochastic hill climbing; success-story algorithm; Acceleration; Genetic algorithms; Learning; Monitoring; Noise measurement; Performance evaluation; Stochastic processes; Testing; Time measurement; Working environment noise;
         
        
        
        
            Conference_Titel : 
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
         
        
            Print_ISBN : 
0-7803-7508-4
         
        
        
            DOI : 
10.1109/ICMLC.2002.1175418