Title of article :
An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling
Author/Authors :
Masaharu Kato، نويسنده , , Tetsuo Kosaka، نويسنده , , Akinori Ito and Shozo Makino، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2009
Abstract :
Topic-based stochastic models such as the probabilistic latent semantic analysis (PLSA) are good tools for adapting a language model into a specific domain using a constraint of global context. A probability given by a topic model is combined with an n-gram probability using the unigram rescaling scheme. One practical problem to apply PLSA to speech recognition is that calculation of probabilities using PLSA is computationally expensive, that prevents the topic-based language model from incorporating that model into decoding process. In this paper, we proposed an algorithm to calculate a back-off n-gram probability with unigram rescaling quickly, without any approximation. This algorithm reduces the calculation of a normalizing factor drastically, which only requires calculation of probabilities of words that appears in the current context. The experimental result showed that the proposed algorithm was more than 6000 times faster than the naive calculation method.
Keywords :
back-off smoothing , Probabilistic latent semantic analysis , N-gram , unigram rescaling
Journal title :
IAENG International Journal of Computer Science
Journal title :
IAENG International Journal of Computer Science