مرکز منطقه ای اطلاع رساني علوم و فناوري - Word segmentation through cross-lingual word-to-phoneme alignment

DocumentCode :

3131854

Title :

Word segmentation through cross-lingual word-to-phoneme alignment

Author :

Stahlberg, F. ; Schlippe, Tim ; Vogel, Sue ; Schultz, Tanja

Author_Institution :

Cognitive Syst. Lab., Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany

fYear :

2012

fDate :

2-5 Dec. 2012

Firstpage :

Lastpage :

Abstract :

We present our new alignment model Model 3P for cross-lingual word-to-phoneme alignment, and show that unsupervised learning of word segmentation is more accurate when information of another language is used. Word segmentation with cross-lingual information is highly relevant to bootstrap pronunciation dictionaries from audio data for Automatic Speech Recognition, bypass the written form in Speech-to-Speech Translation or build the vocabulary of an unseen language, particularly in the context of under-resourced languages. Using Model 3P for the alignment between English words and Spanish phonemes outperforms a state-of-the-art monolingual word segmentation approach [1] on the BTEC corpus [2] by up to 42% absolute in F-Score on the phoneme level and a GIZA++ alignment based on IBM Model 3 by up to 17%.

Keywords :

natural language processing; speech recognition; unsupervised learning; English words; Spanish phonemes; automatic speech recognition; bootstrap pronunciation dictionaries; cross lingual information; cross lingual word-to-phoneme alignment; speech-to-speech translation; unsupervised learning; word segmentation; Dictionaries; Error analysis; Grammar; Hidden Markov models; Training data; Vectors; Vocabulary; alignment model; speech-to-speech translation; under-resourced language; word segmentation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language Technology Workshop (SLT), 2012 IEEE

Conference_Location :

Miami, FL

Print_ISBN :

978-1-4673-5125-6

Electronic_ISBN :

978-1-4673-5124-9

Type :

conf

DOI :

10.1109/SLT.2012.6424202

Filename :

6424202

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3131854