Title :
Online learning of feasible strategies in unknown environments
Author :
Paternain, Santiago ; Ribeiro, Alejandro
Author_Institution :
Dept. of Electr. & Syst. Eng., Univ. of Pennsylvania, Philadelphia, PA, USA
Abstract :
An environment is defined as a set of constraint functions that vary arbitrarily over time. An agent wants to select feasible actions that keep all the constraints negative, but must do so causally. I.e., the dynamical system that determines actions is such that only their time derivatives can depend on the current constraints. An environment is said viable if there exists an action that can satisfy the constraints for all times. The fit of a trajectory is defined as a vector that integrates the constraint violations over time and is used to measure the extent to which a policy succeeds in learning feasible actions. An online saddle point controller is proposed to control fit and shown to do so under minimal technical conditions. The online saddle point controller pushes actions along a linear combination of the constraint negative gradients and dynamically adapts the coefficients of this linear combination to find appropriate weightings. Concepts are illustrated throughout with the problem of a shepherd that wants to stay close to all sheep in a herd. Numerical experiments show that the controller allows the shepherd to do so.
Keywords :
learning systems; linear systems; constraint functions; constraint negative gradients; online learning; online saddle point controller; unknown environments; Convex functions; Feedback loop; Force; Heuristic algorithms; Polynomials; Time factors; Trajectory;
Conference_Titel :
American Control Conference (ACC), 2015
Conference_Location :
Chicago, IL
Print_ISBN :
978-1-4799-8685-9
DOI :
10.1109/ACC.2015.7171994