مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving word segmentation for Thai speech translation

DocumentCode :

2660165

Title :

Improving word segmentation for Thai speech translation

Author :

Charoenpornsawat, Paisarn ; Schultz, Tanja

fYear :

2008

fDate :

15-19 Dec. 2008

Firstpage :

241

Lastpage :

244

Abstract :

A vocabulary list and language model are primary components in a speech translation system. Generating both from plain text is a straightforward task for English. However, it is quite challenging for Chinese, Japanese, or Thai which provide no word segmentation, i.e. the text has no word boundary delimiter. For Thai word segmentation, maximal matching, a lexicon-based approach, is one of the popular methods. Nevertheless this method heavily relies on the coverage of the lexicon. When text contains an unknown word, this method usually produces a wrong boundary. When extracting words from this segmented text, some words will not be retrieved because of wrong segmentation. In this paper, we propose statistical techniques to tackle this problem. Based on different word segmentation methods we develop various speech translation systems and show that the proposed method can significantly improve the translation accuracy by about 6.42% BLEU points compared to the baseline system.

Keywords :

feature extraction; language translation; natural language processing; speech recognition; statistical analysis; vocabulary; Thai speech translation; language model; lexicon-based approach; maximal matching; speech recognition; statistical techniques; text segmentation; vocabulary list; word extraction; word segmentation; Automatic speech recognition; Dictionaries; Entropy; Natural language processing; Natural languages; Speech recognition; Surface-mount technology; Text processing; Training data; Vocabulary; Speech Recognition; Spoken language translation; Text Processing; Word Segmentation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language Technology Workshop, 2008. SLT 2008. IEEE

Conference_Location :

Goa

Print_ISBN :

978-1-4244-3471-8

Electronic_ISBN :

978-1-4244-3472-5

Type :

conf

DOI :

10.1109/SLT.2008.4777885

Filename :

4777885

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2660165