مرکز منطقه ای اطلاع رساني علوم و فناوري - Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields

DocumentCode :

1398776

Title :

Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields

Author :

Wang, Dong ; King, Simon

Author_Institution :

CSTR, Univ. of Edinburgh, Edinburgh, UK

Volume :

Issue :

fYear :

2011

Firstpage :

122

Lastpage :

125

Abstract :

Pronunciation prediction, or letter-to-sound (LTS) conversion, is an essential task for speech synthesis, open vocabulary spoken term detection and other applications dealing with novel words. Most current approaches (at least for English) employ data-driven methods to learn and represent pronunciation “rules” using statistical models such as decision trees, hidden Markov models (HMMs) or joint-multigram models (JMMs). The LTS task remains challenging, particularly for languages with a complex relationship between spelling and pronunciation such as English. In this paper, we propose to use a conditional random field (CRF) to perform LTS because it avoids having to model a distribution over observations and can perform global inference, suggesting that it may be more suitable for LTS than decision trees, HMMs or JMMs. One challenge in applying CRFs to LTS is that the phoneme and grapheme sequences of a word are generally of different lengths, which makes CRF training difficult. To solve this problem, we employed a joint-multigram model to generate aligned training exemplars. Experiments conducted with the AMI05 dictionary demonstrate that a CRF significantly outperforms other models, especially if n-best lists of predictions are generated.

Keywords :

decision trees; grammars; hidden Markov models; natural language processing; random processes; speech synthesis; statistical analysis; vocabulary; AMI05 dictionary; CRF training; English; HMM; JMM; LTS conversion; aligned training exemplars; conditional random fields; decision trees; global inference; grapheme sequences; hidden Markov models; joint-multigram models; letter-to-sound pronunciation prediction; open vocabulary spoken term detection; phoneme sequences; pronunciation rules; speech synthesis; spelling; statistical models; Data models; Decision trees; Hidden Markov models; Joints; Markov processes; Predictive models; Training; Conditional random field; joint multigram model; letter-to-sound; speech synthesis; spoken term detection;

fLanguage :

English

Journal_Title :

Signal Processing Letters, IEEE

Publisher :

ieee

ISSN :

1070-9908

Type :

jour

DOI :

10.1109/LSP.2010.2098440

Filename :

5661808

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1398776