• DocumentCode
    290055
  • Title

    A robust language model incorporating a substring parser and extended n-grams

  • Author

    Wright, J.H. ; Jones, G.J.F. ; Lloyd-Thomas, H.

  • Author_Institution
    Centre for Commun. Res., Bristol Univ., UK
  • Volume
    i
  • fYear
    1994
  • fDate
    19-22 Apr 1994
  • Abstract
    Describes a language model for speech recognition which incorporates a substring parser (to take advantage of syntactic structure covered by a context-free grammar) and extended bigrams (to take advantage of remote dependencies between words). The use of extended bigrams significantly reduces the perplexity and a distribution clustering algorithm alleviates the additional storage cost. The substring parser is the foundation for training and scoring procedures based on paths at all levels through the syntactic structures, with subtrees linked by bigrams. The word bigram score is therefore absorbed into a grammar framework, consolidating the two kinds of language model, and again a significant reduction in perplexity is observed. The aim is an integrated, robust language model that is adaptive to the speaker
  • Keywords
    computational linguistics; context-free grammars; learning (artificial intelligence); natural languages; speech recognition; context-free grammar; distribution clustering algorithm; extended bigrams; extended n-grams; grammar framework; perplexity; remote dependencies; robust language model; scoring; speech recognition; substring parser; syntactic structure; training; word bigram score; Buildings; Clustering algorithms; Context modeling; Costs; Natural languages; Positrons; Probability; Robustness; Speech recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on
  • Conference_Location
    Adelaide, SA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-1775-0
  • Type

    conf

  • DOI
    10.1109/ICASSP.1994.389281
  • Filename
    389281