DocumentCode :
3490377
Title :
N-gram Language Model Based on Multi-Word Expressions in Web Documents for Speech Recognition and Closed-Captioning
Author :
Takahashi, Satoshi ; Morimoto, Takuya
Author_Institution :
Dept. of Elec. Eng. Comp. Sci., Fukuoka Univ., Fukuoka, Japan
fYear :
2012
fDate :
13-15 Nov. 2012
Firstpage :
225
Lastpage :
228
Abstract :
Automatic speech recognition technique is generally used to align the closed caption text to video data. It is important to increase the speech recognition accuracy for the accurate closed-captioning. This paper proposes the method for constructing N-gram language model based on multi word expressions (MWEs) from web retrieval results to improve the speech recognition performance. The web retrieval experiment for examining the distribution of web count numbers for MWEs and the speech recognition experiment for investigating the effectiveness of MWEs are conducted. The experimental results show that the proposed method can improve the recognition performance and the closed-captioning accuracy.
Keywords :
natural language processing; speech recognition; text analysis; video retrieval; word processing; MWE; Web count number distribution; Web document retrieval; automatic speech recognition technique; closed caption text alignment accuracy improvement; multiword expressions; n-gram language model construction; speech recognition performance. improve; video data; Adaptation models; Computational modeling; Probability; Speech; Speech recognition; Training; Vocabulary; N-gram; closed-captioning; language model; speech recognition; web documents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2012 International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4673-6113-2
Electronic_ISBN :
978-0-7695-4886-9
Type :
conf
DOI :
10.1109/IALP.2012.55
Filename :
6473737
Link To Document :
بازگشت