Title :
Semantic Class Induction for Language Model Adaptation in a Chinese Voice Search System
Author :
Li, Yali ; Xu, Weiqun ; Bao, Changchun ; Li, Ta ; Pan, Jielin ; Yan, Yonghong
Author_Institution :
ThinkIT Lab., Chinese Acad. of Sci., Beijing, China
Abstract :
In this paper we describe our work on generating in-domain corpus using auto-induced semantic classes and structures for language model adaptation in a voice search dialogue system. We proposed a novel similarity measure based on co-occurrence probabilities for inducing semantic classes. Clustering with the new similarity measure outperformed that with the widely used distance measure based on Kullback-Leibler divergence. For language model adaptation, we adopted the widely used approach of model interpolation. Experiments show that both human-human and generated data helped a lot and the latter helped more. This means that the generated data is more in-domain than the human-human data for human-computer dialogues. The performance of 9.0% in character recognition error rate and 25.5 in perplexity on the test data is achieved with a language model from an interpolated language model.
Keywords :
human computer interaction; information retrieval; interactive systems; interpolation; natural language processing; probability; speech processing; speech recognition; Chinese voice search system; Kullback-Leibler divergence; automatic speech recognition; co-occurrence probabilities; distance measure; human-computer dialogues; language model adaptation; model interpolation; semantic class induction; voice search dialogue system; Adaptation model; Context; Data models; Hidden Markov models; Measurement; Semantics; Speech recognition; Semantic class induction; corpus generation; language model adaptation;
Conference_Titel :
Electrical and Control Engineering (ICECE), 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-6880-5
DOI :
10.1109/iCECE.2010.460