Title :
Off-policy learning in large-scale POMDP-based dialogue systems
Author :
Daubigney, Lucie ; Geist, Matthieu ; Pietquin, Olivier
Author_Institution :
IMS, Supelec, Metz, France
Abstract :
Reinforcement learning (RL) is now part of the state of the art in the domain of spoken dialogue systems (SDS) optimisation. Most performant RL methods, such as those based on Gaussian Processes, require to test small changes in the policy to assess them as improvements or degradations. This process is called on policy learning. Nevertheless, it can result in system behaviours that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. Such methods usually fail to scale up and are thus not suited for real-world systems. In this contribution, a sample-efficient, online and off-policy RL algorithm is proposed to learn an optimal policy. This algorithm is combined to a compact non-linear value function representation (namely a multi-layers perceptron) enabling to handle large scale systems.
Keywords :
Markov processes; decision making; interactive systems; learning (artificial intelligence); optimisation; speech recognition; speech-based user interfaces; RL methods; SDS optimisation; compact nonlinear value function representation; large-scale POMDP-based dialogue systems; learning algorithm; off-policy RL algorithm; off-policy learning; online RL algorithm; optimal strategy; partially observable Markov decision process; reinforcement learning; sample-efficient RL algorithm; spoken dialogue system optimisation; system behaviours; Approximation methods; Estimation; Learning; Neurons; Noise measurement; Optimization; Speech; Reinforcement Learning; Spoken Dialogue Systems;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6289040