DocumentCode :
3514711
Title :
Scalable reward learning from demonstration
Author :
Michini, Bernard ; Cutler, Mark ; How, Jonathan P.
Author_Institution :
Aerosp. Controls Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
fYear :
2013
fDate :
6-10 May 2013
Firstpage :
303
Lastpage :
308
Abstract :
Reward learning from demonstration is the task of inferring the intents or goals of an agent demonstrating a task. Inverse reinforcement learning methods utilize the Markov decision process (MDP) framework to learn rewards, but typically scale poorly since they rely on the calculation of optimal value functions. Several key modifications are made to a previously developed Bayesian nonparametric inverse reinforcement learning algorithm that avoid calculation of an optimal value function and no longer require discretization of the state or action spaces. Experimental results given demonstrate the ability of the resulting algorithm to scale to larger problems and learn in domains with continuous demonstrations.
Keywords :
Bayes methods; Markov processes; intelligent robots; learning (artificial intelligence); nonparametric statistics; Bayesian nonparametric inverse reinforcement learning algorithm; MDP framework; Markov decision process framework; inverse reinforcement learning method; optimal value function; optimal value functions; scalable reward learning from demonstration; Market research; Programming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Robotics and Automation (ICRA), 2013 IEEE International Conference on
Conference_Location :
Karlsruhe
ISSN :
1050-4729
Print_ISBN :
978-1-4673-5641-1
Type :
conf
DOI :
10.1109/ICRA.2013.6630592
Filename :
6630592
Link To Document :
بازگشت