• DocumentCode
    542298
  • Title

    Language model adaptation through topic decomposition and MDI estimation

  • Author

    Federico, Marcello

  • Author_Institution
    ITC-irst - Centro per la Ricerca Scientifica e Tecnologica, I-38050 Povo di Trento, Italy
  • Volume
    1
  • fYear
    2002
  • fDate
    13-17 May 2002
  • Abstract
    This work presents a language model adaptation method combining the latent semantic analysis framework with the minimum discrimination information estimation criterion. In particular, an unsupervised topic model decomposition is built which allows to infer topic related word distributions from very short adaptation texts. The resulting word distribution is then used to constraint the estimation of a minimum divergence trigram language. With respect to previous work, implementation details are discussed that make such approach effective for a large scale application. Experimental results are provided for a digital library indexing task, i.e. the speech transcription of five historical documentary films. By adapting a trigram language model from very terse content descriptions, i.e. maximum ten words, available for each film, a word error rate relative reduction of 3.2% was achieved.
  • Keywords
    Cities and towns; Erbium; Films; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
  • Conference_Location
    Orlando, FL, USA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7402-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2002.5743832
  • Filename
    5743832