Title :
Scalable reward learning from demonstration
Author :
Michini, Bernard ; Cutler, Mark ; How, Jonathan P.
Author_Institution :
Aerosp. Controls Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
Abstract :
Reward learning from demonstration is the task of inferring the intents or goals of an agent demonstrating a task. Inverse reinforcement learning methods utilize the Markov decision process (MDP) framework to learn rewards, but typically scale poorly since they rely on the calculation of optimal value functions. Several key modifications are made to a previously developed Bayesian nonparametric inverse reinforcement learning algorithm that avoid calculation of an optimal value function and no longer require discretization of the state or action spaces. Experimental results given demonstrate the ability of the resulting algorithm to scale to larger problems and learn in domains with continuous demonstrations.
Keywords :
Bayes methods; Markov processes; intelligent robots; learning (artificial intelligence); nonparametric statistics; Bayesian nonparametric inverse reinforcement learning algorithm; MDP framework; Markov decision process framework; inverse reinforcement learning method; optimal value function; optimal value functions; scalable reward learning from demonstration; Market research; Programming;
Conference_Titel :
Robotics and Automation (ICRA), 2013 IEEE International Conference on
Conference_Location :
Karlsruhe
Print_ISBN :
978-1-4673-5641-1
DOI :
10.1109/ICRA.2013.6630592