Title :
Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection
Author :
Wang, Dong ; King, Simon ; Frankel, Joe
Author_Institution :
EdSST Marie Curie Training Program, Univ. of Edinburgh, Edinburgh, UK
fDate :
5/1/2011 12:00:00 AM
Abstract :
Spoken term detection (STD) is the name given to the task of searching large amounts of audio for occurrences of spoken terms, which are typically single words or short phrases. One reason that STD is a hard task is that search terms tend to contain a disproportionate number of out-of-vocabulary (OOV) words. The most common approach to STD uses subword units. This, in conjunction with some method for predicting pronunciations of OOVs from their written form, enables the detection of OOV terms but performance is considerably worse than for in-vocabulary terms. This performance differential can be largely attributed to the special properties of OOVs. One such property is the high degree of uncertainty in the pronunciation of OOVs. We present a stochastic pronunciation model (SPM) which explicitly deals with this uncertainty. The key insight is to search for all possible pronunciations when detecting an OOV term, explicitly capturing the uncertainty in pronunciation. This requires a probabilistic model of pronunciation, able to estimate a distribution over all possible pronunciations. We use a joint-multigram model (JMM) for this and compare the JMM-based SPM with the conventional soft match approach. Experiments using speech from the meetings domain demonstrate that the SPM performs better than soft match in most operating regions, especially at low false alarm probabilities. Furthermore, SPM and soft match are found to be complementary: their combination provides further performance gains.
Keywords :
audio signal processing; search problems; speech recognition; stochastic processes; vocabulary; word processing; audio search; false alarm probability; joint multigram model; out of vocabulary spoken term detection; probabilistic model; pronunciation uncertainty; soft match; stochastic pronunciation modeling; Letter-to-sound; out-of-vocabulary (OOV); pronunciation modeling; speech recognition; spoken term detection (STD);
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2010.2058800