DocumentCode :
728496
Title :
Online learning of feasible strategies in unknown environments
Author :
Paternain, Santiago ; Ribeiro, Alejandro
Author_Institution :
Dept. of Electr. & Syst. Eng., Univ. of Pennsylvania, Philadelphia, PA, USA
fYear :
2015
fDate :
1-3 July 2015
Firstpage :
4231
Lastpage :
4238
Abstract :
An environment is defined as a set of constraint functions that vary arbitrarily over time. An agent wants to select feasible actions that keep all the constraints negative, but must do so causally. I.e., the dynamical system that determines actions is such that only their time derivatives can depend on the current constraints. An environment is said viable if there exists an action that can satisfy the constraints for all times. The fit of a trajectory is defined as a vector that integrates the constraint violations over time and is used to measure the extent to which a policy succeeds in learning feasible actions. An online saddle point controller is proposed to control fit and shown to do so under minimal technical conditions. The online saddle point controller pushes actions along a linear combination of the constraint negative gradients and dynamically adapts the coefficients of this linear combination to find appropriate weightings. Concepts are illustrated throughout with the problem of a shepherd that wants to stay close to all sheep in a herd. Numerical experiments show that the controller allows the shepherd to do so.
Keywords :
learning systems; linear systems; constraint functions; constraint negative gradients; online learning; online saddle point controller; unknown environments; Convex functions; Feedback loop; Force; Heuristic algorithms; Polynomials; Time factors; Trajectory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
American Control Conference (ACC), 2015
Conference_Location :
Chicago, IL
Print_ISBN :
978-1-4799-8685-9
Type :
conf
DOI :
10.1109/ACC.2015.7171994
Filename :
7171994
Link To Document :
بازگشت