مرکز منطقه ای اطلاع رساني علوم و فناوري - Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR

DocumentCode :

3426616

Title :

Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR

Author :

Zhou, Zhengyu ; Meng, Helen

Author_Institution :

Chinese Hong Kong Univ., Hong Kong

fYear :

2008

fDate :

March 31 2008-April 4 2008

Firstpage :

4933

Lastpage :

4936

Abstract :

ABSTRACT Discriminative n-gram language modeling has been used to re-rank candidate recognition hypotheses for performance improvements in large vocabulary continuous speech recognition (LVCSR). Discriminative n-gram modeling is defined in a linear framework. This work demonstrates that the linear discriminative n-gram model can be recast as a pseudo-conventional n-gram model if the order of the discriminative n-gram model is no higher than the order of the n-gram model in the baseline recognizer. Thus the power of discriminative n-gram model can be captured by mature n-gram related techniques such as single-pass n-gram decoding or lattice rescoring. This work utilizes the pseudo-conventional n-gram model to rescore the recognition lattices that are generated during decoding. Compared to the discriminative N-best re-ranking, this process of discriminative lattice rescoring (DLR) has two positive advantages: (1) Those discriminatively top-ranked utterance hypotheses within the lattice search spaces can be efficiently identified by the A* algorithm; (2) The rescored lattices can be further enhanced with other post-processing techniques to achieve cumulative improvement conveniently. Experiments with Mandarin LVCSR show that DLR improves efficiency - the computation time for 1000-best re-ranking is reduced by more than three-fold. The discriminatively rescored lattices are further processed by re-ranking with word-based mutual information (MI). While the DLR achieves around 15% relative character error rate (CER) reductions over the recognizer baseline, the MI based re-ranking further brings 5% relative CER reductions over the DLR performances.

Keywords :

maximum likelihood estimation; speech recognition; A* algorithm; LVCSR; discriminative lattice rescoring; discriminative n-gram language modeling; large vocabulary continuous speech recognition; pseudo-conventional n-gram model; re-rank candidate recognition hypotheses; word-based mutual information; Character recognition; Error analysis; Hidden Markov models; Lattices; Maximum likelihood decoding; Maximum likelihood estimation; Mutual information; Natural languages; Speech recognition; Vocabulary; Discriminative N-gram Modeling; LVCSR;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location :

Las Vegas, NV

ISSN :

1520-6149

Print_ISBN :

978-1-4244-1483-3

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2008.4518764

Filename :

4518764

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3426616