مرکز منطقه ای اطلاع رساني علوم و فناوري - N-gram Language Model Based on Multi-Word Expressions in Web Documents for Speech Recognition and Closed-Captioning

DocumentCode :

3490377

Title :

N-gram Language Model Based on Multi-Word Expressions in Web Documents for Speech Recognition and Closed-Captioning

Author :

Takahashi, Satoshi ; Morimoto, Takuya

Author_Institution :

Dept. of Elec. Eng. Comp. Sci., Fukuoka Univ., Fukuoka, Japan

fYear :

2012

fDate :

13-15 Nov. 2012

Firstpage :

225

Lastpage :

228

Abstract :

Automatic speech recognition technique is generally used to align the closed caption text to video data. It is important to increase the speech recognition accuracy for the accurate closed-captioning. This paper proposes the method for constructing N-gram language model based on multi word expressions (MWEs) from web retrieval results to improve the speech recognition performance. The web retrieval experiment for examining the distribution of web count numbers for MWEs and the speech recognition experiment for investigating the effectiveness of MWEs are conducted. The experimental results show that the proposed method can improve the recognition performance and the closed-captioning accuracy.

Keywords :

natural language processing; speech recognition; text analysis; video retrieval; word processing; MWE; Web count number distribution; Web document retrieval; automatic speech recognition technique; closed caption text alignment accuracy improvement; multiword expressions; n-gram language model construction; speech recognition performance. improve; video data; Adaptation models; Computational modeling; Probability; Speech; Speech recognition; Training; Vocabulary; N-gram; closed-captioning; language model; speech recognition; web documents;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Asian Language Processing (IALP), 2012 International Conference on

Conference_Location :

Hanoi

Print_ISBN :

978-1-4673-6113-2

Electronic_ISBN :

978-0-7695-4886-9

Type :

conf

DOI :

10.1109/IALP.2012.55

Filename :

6473737

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3490377