• DocumentCode
    3166807
  • Title

    Semi-supervised discriminative language modeling for Turkish ASR

  • Author

    Çelebi, A. ; Sak, H. ; Dikici, E. ; Saraçlar, M. ; Lehr, M. ; Prud´hommeaux, E. ; Xu, P. ; Glenn, N. ; Karakos, D. ; Khudanpur, S. ; Roark, B. ; Sagae, K. ; Shafran, I. ; Bikel, D. ; Callison-Burch, C. ; Cao, Y. ; Hall, K. ; Hasler, E. ; Koehn, P. ; Lopez

  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    5025
  • Lastpage
    5028
  • Abstract
    We present our work on semi-supervised learning of discriminative language models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select competing hypotheses for training with a variant of the perceptron algorithm. We find that morph-based confusion models with a sample selection strategy aiming to match the error distribution of the baseline ASR system gives the best performance. We also observe that substituting half of the supervised training examples with those obtained in a semi-supervised manner gives similar results.
  • Keywords
    learning (artificial intelligence); natural language processing; signal sampling; speech recognition; Turkish ASR; automatic speech recognition; morph-based confusion models; perceptron algorithm; sample selection strategy; semisupervised discriminative language modeling; semisupervised learning; supervised training; Acoustics; Computational modeling; Lattices; Semisupervised learning; Speech; Speech recognition; Training; Confusion Modeling; Discriminative Training; Language Modeling; Semi-supervised Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289049
  • Filename
    6289049