DocumentCode
589275
Title
Backtracking for More Efficient Large Scale Dynamic Programming
Author
Tripp, C. ; Shachter, R.
Author_Institution
Dept. of Electr. Eng., Stanford Univ., Stanford, CA, USA
Volume
1
fYear
2012
fDate
12-15 Dec. 2012
Firstpage
338
Lastpage
343
Abstract
Reinforcement learning algorithms are widely used to generate policies for complex Markov decision processes. We introduce backtracking, a modification to reinforcement learning algorithms that can significantly improve their performance, particularly for off-line policy generation. Backtracking waits to perform update calculations until the successor´s value has been updated, allowing immediate reuse of update calculations. We demonstrate the effectiveness of backtracking on two benchmark processes using both Q-learning and real-time dynamic programming.
Keywords
Markov processes; decision making; dynamic programming; learning (artificial intelligence); Q-learning; complex Markov decision processes; large scale dynamic programming; offline policy generation; real-time dynamic programming; reinforcement learning algorithms; Dynamic programming; Erbium; Heuristic algorithms; Learning; Mathematical model; Process control; Real-time systems; Q-Learning; backtracking; dynamic programming; experience replay; reinforcement learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location
Boca Raton, FL
Print_ISBN
978-1-4673-4651-1
Type
conf
DOI
10.1109/ICMLA.2012.63
Filename
6406685
Link To Document