Title :
Temporal supervised learning for inferring a dialog policy from example conversations
Author :
Lihong Li ; He He ; Williams, Jason D.
Author_Institution :
Microsoft Res., Redmond, WA, USA
Abstract :
This paper tackles the problem of learning a dialog policy from example dialogs - for example, from Wizard-of-Oz style dialogs, where an expert (person) plays the role of the system. Learning in this setting is challenging because dialog is a temporal process in which actions affect the future course of the conversation - i.e., dialog requires planning. Past work solved this problem with either conventional supervised learning or reinforcement learning. Reinforcement learning provides a principled approach to planning, but requires more resources than a fixed corpus of examples, such as a dialog simulator or a reward function. Conventional supervised learning, by contrast, operates directly from example dialogs but does not take proper account of planning. We introduce a new algorithm called Temporal Supervised Learning which learns directly from example dialogs, while also taking proper account of planning. The key idea is to choose the next dialog action to maximize the expected discounted accuracy until the end of the dialog. On a dialog testbed in the calendar domain, in simulation, we show that a dialog manager trained with temporal supervised learning substantially outperforms a baseline trained using conventional supervised learning.
Keywords :
interactive systems; learning (artificial intelligence); dialog policy learning; example conversations; example dialogs; reinforcement learning; temporal process; temporal supervised learning; Accuracy; Learning (artificial intelligence); Planning; Semantics; Stochastic processes; Supervised learning; Training;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
DOI :
10.1109/SLT.2014.7078593