Title :
WEB-derived pronunciations
Author :
Ghoshal, Arnab ; Jansche, Martin ; Khudanpur, Anjeev ; Riley, Michael ; Ulinski, Morgan
Author_Institution :
Johns Hopkins Univ., Baltimore, MD
Abstract :
Pronunciation information is available in large quantities on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic from ad-hoc transcriptions. We show improvements on a letter-to-phoneme task when using Web-derived vs. Pronlex pronunciations.
Keywords :
Internet; speech processing; IPA pronunciations; Pronlex pronunciations; Web-derived pronunciations; ad-hoc transcriptions; candidate pronunciation extraction; letter-to-phoneme task; orthographic words; Automatic speech recognition; Data mining; Decision trees; Hidden Markov models; Information filtering; Information filters; Law; Speech processing; Speech synthesis; Web pages; Speech processing;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2009.4960577