DocumentCode
2498116
Title
Agent self-assessment: Determining policy quality without execution
Author
Hans, Alexander ; Duell, Siegmund ; Udluft, Steffen
Author_Institution
Neuroinformatics & Cognitive Robot. Lab., Ilmenau Univ. of Technol., Ilmenau, Germany
fYear
2011
fDate
11-15 April 2011
Firstpage
84
Lastpage
90
Abstract
With the development of data-efficient reinforcement learning (RL) methods, a promising data-driven solution for optimal control of complex technical systems has become available. For the application of RL to a technical system, it is usually required to evaluate a policy before actually applying it to ensure it operates the system safely and within required performance bounds. In benchmark applications one can use the system dynamics directly to measure the policy quality. In real applications, however, this might be too expensive or even impossible. Being unable to evaluate the policy without using the actual system hinders the application of RL to autonomous controllers. As a first step toward agent self-assessment, we deal with discrete MDPs in this paper. We propose to use the value function along with its uncertainty to assess a policy´s quality and show that, when dealing with an MDP estimated from observations, the value function itself can be misleading. We address this problem by determining the value function´s uncertainty through uncertainty propagation and evaluate the approach using a number of benchmark applications.
Keywords
Markov processes; control engineering computing; learning (artificial intelligence); multi-agent systems; optimal control; Markov decision process; agent self-assessment; complex technical system; data-efficient reinforcement learning; discrete MDP; optimal control; policy quality; uncertainty propagation; Approximation algorithms; Benchmark testing; Equations; Histograms; Machine learning; Markov processes; Uncertainty; Markov decision processes; autonomous agent; policy quality; reinforcement learning; robustness; self-assessment; uncertainty propagation;
fLanguage
English
Publisher
ieee
Conference_Titel
Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011 IEEE Symposium on
Conference_Location
Paris
Print_ISBN
978-1-4244-9887-1
Type
conf
DOI
10.1109/ADPRL.2011.5967358
Filename
5967358
Link To Document