DocumentCode :
3131854
Title :
Word segmentation through cross-lingual word-to-phoneme alignment
Author :
Stahlberg, F. ; Schlippe, Tim ; Vogel, Sue ; Schultz, Tanja
Author_Institution :
Cognitive Syst. Lab., Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
fYear :
2012
fDate :
2-5 Dec. 2012
Firstpage :
85
Lastpage :
90
Abstract :
We present our new alignment model Model 3P for cross-lingual word-to-phoneme alignment, and show that unsupervised learning of word segmentation is more accurate when information of another language is used. Word segmentation with cross-lingual information is highly relevant to bootstrap pronunciation dictionaries from audio data for Automatic Speech Recognition, bypass the written form in Speech-to-Speech Translation or build the vocabulary of an unseen language, particularly in the context of under-resourced languages. Using Model 3P for the alignment between English words and Spanish phonemes outperforms a state-of-the-art monolingual word segmentation approach [1] on the BTEC corpus [2] by up to 42% absolute in F-Score on the phoneme level and a GIZA++ alignment based on IBM Model 3 by up to 17%.
Keywords :
natural language processing; speech recognition; unsupervised learning; English words; Spanish phonemes; automatic speech recognition; bootstrap pronunciation dictionaries; cross lingual information; cross lingual word-to-phoneme alignment; speech-to-speech translation; unsupervised learning; word segmentation; Dictionaries; Error analysis; Grammar; Hidden Markov models; Training data; Vectors; Vocabulary; alignment model; speech-to-speech translation; under-resourced language; word segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
Type :
conf
DOI :
10.1109/SLT.2012.6424202
Filename :
6424202
Link To Document :
بازگشت