Title :
Reinforcement learning for spoken dialogue systems using off-policy natural gradient method
Author_Institution :
Fac. of Math. & Phys., Charles Univ. in Prague, Prague, Czech Republic
Abstract :
Reinforcement learning methods have been successfully used to optimise dialogue strategies in statistical dialogue systems. Typically, reinforcement techniques learn on-policy i.e., the dialogue strategy is updated online while the system is interacting with a user. An alternative to this approach is off-policy reinforcement learning, which estimates an optimal dialogue strategy offline from a fixed corpus of previously collected dialogues. This paper proposes a novel off-policy reinforcement learning method based on natural policy gradients and importance sampling. The algorithm is evaluated on a spoken dialogue system in the tourist information domain. The experiments indicate that the proposed method learns a dialogue strategy, which significantly outperforms the baseline handcrafted dialogue policy.
Keywords :
gradient methods; importance sampling; interactive systems; learning (artificial intelligence); optimisation; speech-based user interfaces; travel industry; dialogue strategy optimisation; importance sampling; off-policy natural gradient method; off-policy reinforcement learning method; optimal dialogue strategy; spoken dialogue systems; statistical dialogue systems; tourist information domain; Gradient methods; History; Learning; Linear approximation; Stochastic processes; Training; POMDP; dialogue management; off-policy reinforcement learning; policy gradient methods;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
DOI :
10.1109/SLT.2012.6424161