DocumentCode :
493372
Title :
Inferring bounds on the performance of a control policy from a sample of trajectories
Author :
Fonteneau, Raphael ; Murphy, Susan ; Wehenkel, Louis ; Ernst, Damien
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Liege, Liege
fYear :
2009
fDate :
March 30 2009-April 2 2009
Firstpage :
117
Lastpage :
123
Abstract :
We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density.
Keywords :
continuous systems; optimal control; optimisation; polynomials; Lipschitz continuous; control policy; optimization horizon; polynomial algorithm; reward function; trajectories sample; Artificial intelligence; Biomedical engineering; Computational modeling; Control systems; Dynamic programming; Fingers; Optimal control; Polynomials; Predictive models; Upper bound;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. IEEE Symposium on
Conference_Location :
Nashville, TN
Print_ISBN :
978-1-4244-2761-1
Type :
conf
DOI :
10.1109/ADPRL.2009.4927534
Filename :
4927534
Link To Document :
بازگشت