• DocumentCode
    2769995
  • Title

    A method for evaluating and comparing user simulations: The Cramér-von Mises divergence

  • Author

    Williams, Jason D.

  • Author_Institution
    AT&T Labs -Res., Florham Park
  • fYear
    2007
  • fDate
    9-13 Dec. 2007
  • Firstpage
    508
  • Lastpage
    513
  • Abstract
    Although user simulations are increasingly employed in the development and assessment of spoken dialog systems, there is no accepted method for evaluating user simulations. In this paper, we propose a novel quality measure for user simulations. We view a user simulation as a predictor of the performance of a dialog system, where per-dialog performance is measured with a domain-specific scoring function. The quality of the user simulation is measured as the divergence between the distribution of scores in real dialogs and simulated dialogs, and we argue that the Cramer-von Mises divergence is well-suited to this task. The technique is demonstrated on a corpus of real calls, and we present a table of critical values for practitioners to interpret the statistical significance of comparisons between user simulations.
  • Keywords
    interactive systems; speech recognition; statistical analysis; user modelling; Cramer-von Mises divergence; domain-specific scoring function; quality measure; spoken dialog system; statistical analysis; user simulation; Algorithm design and analysis; Computational modeling; Design optimization; Hidden Markov models; Humans; Laboratories; Machine learning; Machine learning algorithms; Predictive models; Speech recognition; User simulation; dialog management; dialog simulation; user modelling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
  • Conference_Location
    Kyoto
  • Print_ISBN
    978-1-4244-1746-9
  • Electronic_ISBN
    978-1-4244-1746-9
  • Type

    conf

  • DOI
    10.1109/ASRU.2007.4430164
  • Filename
    4430164