Title :
Backtracking for More Efficient Large Scale Dynamic Programming
Author :
Tripp, C. ; Shachter, R.
Author_Institution :
Dept. of Electr. Eng., Stanford Univ., Stanford, CA, USA
Abstract :
Reinforcement learning algorithms are widely used to generate policies for complex Markov decision processes. We introduce backtracking, a modification to reinforcement learning algorithms that can significantly improve their performance, particularly for off-line policy generation. Backtracking waits to perform update calculations until the successor´s value has been updated, allowing immediate reuse of update calculations. We demonstrate the effectiveness of backtracking on two benchmark processes using both Q-learning and real-time dynamic programming.
Keywords :
Markov processes; decision making; dynamic programming; learning (artificial intelligence); Q-learning; complex Markov decision processes; large scale dynamic programming; offline policy generation; real-time dynamic programming; reinforcement learning algorithms; Dynamic programming; Erbium; Heuristic algorithms; Learning; Mathematical model; Process control; Real-time systems; Q-Learning; backtracking; dynamic programming; experience replay; reinforcement learning;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4673-4651-1
DOI :
10.1109/ICMLA.2012.63