Title :
Guiding Autonomous Agents to Better Behaviors through Human Advice
Author :
Kunapuli, Gautam ; Odom, Phillip ; Shavlik, Jude W. ; Natarajan, Sriraam
Author_Institution :
Dept. of Biostat. & Med. Inf., Univ. of Wisconsin-Madison, Madison, WI, USA
Abstract :
Inverse Reinforcement Learning (IRL) is an approach for domain-reward discovery from demonstration, where an agent mines the reward function of a Markov decision process by observing an expert acting in the domain. In the standard setting, it is assumed that the expert acts (nearly) optimally, and a large number of trajectories, i.e., training examples are available for reward discovery (and consequently, learning domain behavior). These are not practical assumptions: trajectories are often noisy, and there can be a paucity of examples. Our novel approach incorporates advice-giving into the IRL framework to address these issues. Inspired by preference elicitation, a domain expert provides advice on states and actions (features) by stating preferences over them. We evaluate our approach on several domains and show that with small amounts of targeted preference advice, learning is possible from noisy demonstrations, and requires far fewer trajectories compared to simply learning from trajectories alone.
Keywords :
Markov processes; decision theory; learning (artificial intelligence); software agents; IRL framework; Markov decision process; autonomous agents; domain expert; domain-reward discovery; human advice; inverse reinforcement learning; learning domain behavior; preference elicitation; reward function; Data mining; Educational institutions; Equations; Learning (artificial intelligence); Noise measurement; Training; Trajectory;
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
DOI :
10.1109/ICDM.2013.79