DocumentCode :
3756770
Title :
Sparse Temporal Difference Learning via Alternating Direction Method of Multipliers
Author :
Nikos Tsipinakis;James D.B. Nelson
Author_Institution :
Dept. of Stat. Sci., Univ. Coll. London, London, UK
fYear :
2015
Firstpage :
220
Lastpage :
225
Abstract :
Recent work in off-line Reinforcement Learning has focused on efficient algorithms to incorporate feature selection, via l1-regularization, into the Bellman operator fixed-point estimators. These developments now mean that over-fitting can be avoided when the number of samples is small compared to the number of features. However, it remains unclear whether existing algorithms have the ability to offer good approximations for the task of policy evaluation and improvement. In this paper, we propose a new algorithm for approximating the fixed-point based on the Alternating Direction Method of Multipliers (ADMM). We demonstrate, with experimental results, that the proposed algorithm is more stable for policy iteration compared to prior work. Furthermore, we also derive a theoretical result that states the proposed algorithm obtains a solution which satisfies the optimality conditions for the fixed-point problem.
Keywords :
"Approximation algorithms","Context","Optimization","Function approximation","Learning (artificial intelligence)","Prediction algorithms","Linear programming"
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on
Type :
conf
DOI :
10.1109/ICMLA.2015.36
Filename :
7424312
Link To Document :
بازگشت