DocumentCode
2769995
Title
A method for evaluating and comparing user simulations: The Cramér-von Mises divergence
Author
Williams, Jason D.
Author_Institution
AT&T Labs -Res., Florham Park
fYear
2007
fDate
9-13 Dec. 2007
Firstpage
508
Lastpage
513
Abstract
Although user simulations are increasingly employed in the development and assessment of spoken dialog systems, there is no accepted method for evaluating user simulations. In this paper, we propose a novel quality measure for user simulations. We view a user simulation as a predictor of the performance of a dialog system, where per-dialog performance is measured with a domain-specific scoring function. The quality of the user simulation is measured as the divergence between the distribution of scores in real dialogs and simulated dialogs, and we argue that the Cramer-von Mises divergence is well-suited to this task. The technique is demonstrated on a corpus of real calls, and we present a table of critical values for practitioners to interpret the statistical significance of comparisons between user simulations.
Keywords
interactive systems; speech recognition; statistical analysis; user modelling; Cramer-von Mises divergence; domain-specific scoring function; quality measure; spoken dialog system; statistical analysis; user simulation; Algorithm design and analysis; Computational modeling; Design optimization; Hidden Markov models; Humans; Laboratories; Machine learning; Machine learning algorithms; Predictive models; Speech recognition; User simulation; dialog management; dialog simulation; user modelling;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location
Kyoto
Print_ISBN
978-1-4244-1746-9
Electronic_ISBN
978-1-4244-1746-9
Type
conf
DOI
10.1109/ASRU.2007.4430164
Filename
4430164
Link To Document