• DocumentCode
    179360
  • Title

    Abin-based ontological framework for low-resourcen-gram smoothing in language modelling

  • Author

    Benahmed, Y. ; Selouani, Sid-Ahmed ; O´Shaughnessy, D.

  • Author_Institution
    INRS-EMT, Montréal, QC, Canada
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    4918
  • Lastpage
    4922
  • Abstract
    In this paper, we introduce a novel method of smoothing language models (LM) based on the semantic information found in ontologies that is especially adapted for limited-resources language modeling. We exploit the latent knowledge of language that is deeply encoded within ontologies. As such, this work examines the potential of using the semantic and syntactic relations between words from the WordNet ontology to generate new plausible contexts for unseen events to simulate a larger corpus. These unseen events are then mixed-up with a baseline Witten-Bell(WB) LM in order to improve its performance both in terms of language model perplexity and automatic speech recognition word error rates. Results indicate a significant reduction in the perplexity of the language model (up to 9.85% relative) all the while reducing word error rate in a statistically significant manner compared to both the original WB LM and baseline Kneser-Ney smoothed language model on the Wall Street Journal-based Continuous Speech Recognition Phase II corpus.
  • Keywords
    natural language processing; ontologies (artificial intelligence); speech recognition; LM smoothing; Wall Street Journal; WordNet ontology; automatic speech recognition word error rates; baseline WB LM; baseline Witten-Bell LM; bin-based ontological framework; continuous speech recognition phase II corpus; in ontologies; language model perplexity; language model smoothing; limited-resources language modeling; low-resource N-gram smoothing; plausible context generation; semantic relation; syntactic relation; unseen events; word error rate reduction; Automatic speech recognition; Computational modeling; Ontologies; Optimization; Smoothing methods; Speech; Language modeling; context modeling; low-resource speech recognition; ontologies;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854537
  • Filename
    6854537