مرکز منطقه ای اطلاع رساني علوم و فناوري - Data augmentation and language model adaptation

DocumentCode :

1749712

Title :

Data augmentation and language model adaptation

Author :

Janiszek, D. ; De Mori, R. ; Bechet, F.

Author_Institution :

LIA, Univ. of Avignon, France

Volume :

fYear :

2001

fDate :

2001

Firstpage :

549

Abstract :

A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram language model (LM) This method is based on numerical distances in a reduced space obtained by singular value decomposition. Rescoring word lattices in a spoken dialogue application using an LM containing augmented counts has lead to a word error rate (WER) reduction of 6.5%. By further interpolating augmented counts with the counts extracted from a very large newspaper corpus, but only for selected histories, a total WER reduction of 11.7% was obtained. We show that this approach gives better results than a global count interpolation for all histories of the LM

Keywords :

eigenvalues and eigenfunctions; natural languages; probability; singular value decomposition; speech recognition; 2-gram language model; automatic speech recognition systems; data augmentation; language model adaptation; numerical distances; rescoring word lattices; singular value decomposition; spoken dialogue; very large newspaper corpus; word error rate reduction; Adaptation model; Automatic speech recognition; Error analysis; History; Interpolation; Lattices; Matrix decomposition; Probability distribution; Singular value decomposition; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on

Conference_Location :

Salt Lake City, UT

ISSN :

1520-6149

Print_ISBN :

0-7803-7041-4

Type :

conf

DOI :

10.1109/ICASSP.2001.940890

Filename :

940890

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1749712