DocumentCode :
133921
Title :
Research of an off-policy BF optimization aloorithm based on cross-entroropy method
Author :
Yuchen Fu ; Xuwen Song
Author_Institution :
Dept. of Inf. Technol., Suzhou Ind. Park Inst. of Services Outsourcing, Suzhou, China
fYear :
2014
fDate :
3-7 Aug. 2014
Firstpage :
861
Lastpage :
866
Abstract :
In the reinforcement leaning task, the off-policy algorithms which approximately evaluate the values of states faced with the problem of high evaluation error and were sensitive to the distribution of behavior policy. In order to solve these problems, the basis function optimization method under the off-policy scenario was proposed. The algorithm set the Bellman error of the target policy which was computed with off-policy prediction algorithms as the objective function, then adjust the placement and shape of the basis functions in cooperate with the method of cross-entropy optimization. The experimental results on the grid world show that the algorithm effectively reduced the evaluation error and improved the approximation. Additionally, the algorithm could be easily extended to the problems of large state spaces.
Keywords :
learning (artificial intelligence); optimisation; Bellman error; approximation; basis function optimization method; cross-entropy optimization method; evaluation error reduction; objective function; off-policy BF optimization aloorithm; off-policy prediction algorithm; reinforcement leaning task; Approximation algorithms; Function approximation; Learning (artificial intelligence); Linear programming; Optimization; Prediction algorithms; basis function optimization; cross-entropy optimization; off-policy learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
World Automation Congress (WAC), 2014
Conference_Location :
Waikoloa, HI
Type :
conf
DOI :
10.1109/WAC.2014.6936177
Filename :
6936177
Link To Document :
بازگشت