A multispan language modeling framework for large vocabulary speech recognition

Author

Bellegarda, Jerome R.

Author_Institution

Spoken Language Group, Apple Comput. Inc., Cupertino, CA, USA

Volume

6

Issue

5

fYear

1998

fDate

9/1/1998 12:00:00 AM

Firstpage

456

Lastpage

467

Abstract

A new framework is proposed to construct multispan language models for large vocabulary speech recognition, by exploiting both local and global constraints present in the language. While statistical n-gram modeling can readily take local constraints into account, global constraints have been more difficult to handle within a data-driven formalism. In this work, they are captured via a paradigm first formulated in the context of information retrieval, called latent semantic analysis (LSA). This paradigm seeks to automatically uncover the salient semantic relationships between words and documents in a given corpus. Such discovery relies on a parsimonious vector representation of each word and each document in a suitable, common vector space. Since in this space familiar clustering techniques can be applied, it becomes possible to derive several families of large-span language models, with various smoothing properties. Because of their semantic nature, the new language models are well suited to complement conventional, more syntactically oriented n-grams, and the combination of the two paradigms naturally yields the benefit of a multispan context. An integrative formulation is proposed for this purpose, in which the latent semantic information is used to adjust the standard n-gram probability. The performance of the resulting multispan language models, as measured by perplexity, compares favorably with the corresponding n-gram performance

Keywords

grammars; information retrieval; natural languages; probability; speech recognition; statistical analysis; clustering techniques; data-driven formalism; documents; global constraints; information retrieval; integrative formulation; large vocabulary speech recognition; large-span language models; latent semantic analysis; local constraints; multispan language modeling; n-gram performance; n-gram probability; perplexity; semantic relationships; smoothing properties; statistical n-gram modeling; vector representation; vector space; words; Acoustic applications; Context modeling; Frequency estimation; Information analysis; Information retrieval; Natural languages; Probability; Smoothing methods; Speech recognition; Vocabulary;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.709671

Filename

709671