• DocumentCode
    2701034
  • Title

    Data Driven Approach for Language Model Adaptation using Stepwise Relative Entropy Minimization

  • Author

    Sethy, Abhinav ; Narayanan, Shrikanth ; Ramabhadran, Bhuvana

  • Author_Institution
    Dept. of Electr. Eng.-Syst., Souther California Univ., CA, USA
  • Volume
    4
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Abstract
    The ability to build domain and task specific language models from large generic text corpora is of considerable interest to the language modeling community. One of the key challenges is to identify the relevant text material in the collection. The text selection problem can be cast in a semi-supervised learning framework. Motivated by recent advancements in semi-supervised learning which emphasize the need of balanced label assignments, we present a stepwise relative entropy minimization scheme which focuses on selection of a set of sentences instead of selecting sentences solely on their individual merit. Our results on the IBM European Parliament Plenary Speech (EPPS) transcription system, show significant performance improvement (0.5% on an 8.9% baseline), with just a seventh of the out-of-domain data. The IBM EPPS LVCSR system which has a 60K vocabulary is a particularly hard baseline for out-of-domain adaptation because of low WER with in-domain training data.
  • Keywords
    learning (artificial intelligence); natural language processing; speech recognition; IBM European Parliament Plenary Speech; balanced label assignments; data driven approach; language model adaptation; semi-supervised learning framework; speech recognition; stepwise relative entropy minimization; Adaptation model; Data engineering; Entropy; Humans; Natural languages; Semisupervised learning; Speech analysis; Speech recognition; Speech synthesis; Viterbi algorithm; Language model adaptation; TC-STAR; relative entropy; speech recognition; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0727-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2007.367192
  • Filename
    4218066