DocumentCode :
1398776
Title :
Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields
Author :
Wang, Dong ; King, Simon
Author_Institution :
CSTR, Univ. of Edinburgh, Edinburgh, UK
Volume :
18
Issue :
2
fYear :
2011
Firstpage :
122
Lastpage :
125
Abstract :
Pronunciation prediction, or letter-to-sound (LTS) conversion, is an essential task for speech synthesis, open vocabulary spoken term detection and other applications dealing with novel words. Most current approaches (at least for English) employ data-driven methods to learn and represent pronunciation “rules” using statistical models such as decision trees, hidden Markov models (HMMs) or joint-multigram models (JMMs). The LTS task remains challenging, particularly for languages with a complex relationship between spelling and pronunciation such as English. In this paper, we propose to use a conditional random field (CRF) to perform LTS because it avoids having to model a distribution over observations and can perform global inference, suggesting that it may be more suitable for LTS than decision trees, HMMs or JMMs. One challenge in applying CRFs to LTS is that the phoneme and grapheme sequences of a word are generally of different lengths, which makes CRF training difficult. To solve this problem, we employed a joint-multigram model to generate aligned training exemplars. Experiments conducted with the AMI05 dictionary demonstrate that a CRF significantly outperforms other models, especially if n-best lists of predictions are generated.
Keywords :
decision trees; grammars; hidden Markov models; natural language processing; random processes; speech synthesis; statistical analysis; vocabulary; AMI05 dictionary; CRF training; English; HMM; JMM; LTS conversion; aligned training exemplars; conditional random fields; decision trees; global inference; grapheme sequences; hidden Markov models; joint-multigram models; letter-to-sound pronunciation prediction; open vocabulary spoken term detection; phoneme sequences; pronunciation rules; speech synthesis; spelling; statistical models; Data models; Decision trees; Hidden Markov models; Joints; Markov processes; Predictive models; Training; Conditional random field; joint multigram model; letter-to-sound; speech synthesis; spoken term detection;
fLanguage :
English
Journal_Title :
Signal Processing Letters, IEEE
Publisher :
ieee
ISSN :
1070-9908
Type :
jour
DOI :
10.1109/LSP.2010.2098440
Filename :
5661808
Link To Document :
بازگشت