• DocumentCode
    3427466
  • Title

    An unsupervised web-based topic language model adaptation method

  • Author

    Lecorvé, Gwénolé ; Gravier, Guillaume ; Sébillot, Pascale

  • Author_Institution
    IRISA, Rennes
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    5081
  • Lastpage
    5084
  • Abstract
    This paper focuses on a solution to better adapt ASR systems, whose language models (LM) are usually trained on topic-independent corpora, to new topics, in particular in the case of broadcast news. We propose a new complete and fully unsupervised technique that selects keywords from each segment using information retrieval methods, to build a thematically coherent adaptation corpus from the Internet. The LM used for the initial transcription is then adapted before rescoring word lattices. Experimental results demonstrate the validity of the proposed adaptation technique with a significant reduction of the perplexity after LM adaptation. Word error rates are also improved in some cases though to a lesser extent.
  • Keywords
    Internet; natural language processing; query processing; speech recognition; information retrieval; natural languages; rescoring word lattices; speech recognition; topic-independent corpora; unsupervised Web-based topic language model; word error rates; Adaptation model; Automatic speech recognition; Broadcasting; Error analysis; Information retrieval; Internet; Lattices; Natural languages; Streaming media; Vocabulary; Internet; Speech recognition; natural languages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518801
  • Filename
    4518801