• DocumentCode
    590665
  • Title

    Expansion of training texts to generate a topic-dependent language model for meeting speech recognition

  • Author

    Egashira, K. ; Kojima, Keisuke ; Yamashita, Masaru ; Yamauchi, Kazuto ; Matsunaga, Shinichiro

  • Author_Institution
    Nagasaki Univ., Nagasaki, Japan
  • fYear
    2012
  • fDate
    3-6 Dec. 2012
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    This paper proposes expansion methods for training texts (baseline) to generate a topic-dependent language model for more accurate recognition of meeting speech. To prepare a universal language model that can cope with the variety of topics discussed in meetings is very difficult. Our strategy is to generate topic-dependent training texts based on two methods. The first is text collection from web pages using queries that consist of topic-dependent confident terms; these terms were selected from preparatory recognition results based on the TF-IDF (TF; Term Frequency, IDF; Inversed Document Frequency) values of each term. The second technique is text generation using participants´ names. Our topic-dependent language model was generated using these new texts and the baseline corpus. The language model generated by the proposed strategy reduced the perplexity by 16.4% and out-of-vocabulary rate by 37.5%, respectively, compared with the language model that used only the baseline corpus. This improvement was confirmed through meeting speech recognition as well.
  • Keywords
    speech recognition; vocabulary; TF-IDF; Web page; baseline corpus; meeting speech recognition; term frequency inversed document frequency; text collection; text generation technique; topic-dependent confident term query; topic-dependent language model; topic-dependent training text; Acoustics; Adaptation models; Hidden Markov models; Speech; Speech recognition; Text recognition; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific
  • Conference_Location
    Hollywood, CA
  • Print_ISBN
    978-1-4673-4863-8
  • Type

    conf

  • Filename
    6411812