DocumentCode :
3646039
Title :
On-line policy optimisation of spoken dialogue systems via live interaction with human subjects
Author :
Milica Gašić;Filip Jurčíček;Blaise Thomson;Kai Yu;Steve Young
Author_Institution :
Cambridge University Engineering Department, Trumpington St, CB1 2PZ, UK
fYear :
2011
Firstpage :
312
Lastpage :
317
Abstract :
Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards.
Keywords :
"Training","Error analysis","Learning systems","Gaussian processes","Kernel","Humans","Robustness"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Print_ISBN :
978-1-4673-0365-1
Type :
conf
DOI :
10.1109/ASRU.2011.6163950
Filename :
6163950
Link To Document :
بازگشت