• DocumentCode
    10647
  • Title

    Syntactic and Semantic Features For Code-Switching Factored Language Models

  • Author

    Adel, Heike ; Ngoc Thang Vu ; Kirchhoff, Katrin ; Telaar, Dominic ; Schultz, Tanja

  • Author_Institution
    Center for Inf. & Language Process. (CIS), Univ. of Munich, Munich, Germany
  • Volume
    23
  • Issue
    3
  • fYear
    2015
  • fDate
    Mar-15
  • Firstpage
    431
  • Lastpage
    440
  • Abstract
    This paper presents our latest investigations on different features for factored language models for Code-Switching speech and their effect on automatic speech recognition (ASR) performance. We focus on syntactic and semantic features which can be extracted from Code-Switching text data and integrate them into factored language models. Different possible factors, such as words, part-of-speech tags, Brown word clusters, open class words and clusters of open class word embeddings are explored. The experimental results reveal that Brown word clusters, part-of-speech tags and open-class words are the most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME. In ASR experiments, the model containing Brown word clusters and part-of-speech tags and the model also including clusters of open class word embeddings yield the best mixed error rate results. In summary, the best language model can significantly reduce the perplexity on the SEAME evaluation set by up to 10.8% relative and the mixed error rate by up to 3.4% relative.
  • Keywords
    natural language processing; speech recognition; ASR performance; Brown word clusters; Mandarin-English code-switching corpus; SEAME; automatic speech recognition; code-switching factored language models; code-switching speech; code-switching text data; open class word embeddings; part-of-speech tags; semantic features; syntactic features; Context; IEEE transactions; Semantics; Speech; Speech processing; Training; Vectors; Automatic speech recognition (ASR); natural language processing; recurrent neural networks;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2389622
  • Filename
    7005440