• DocumentCode
    302107
  • Title

    Ergodic multigram HMM integrating word segmentation and class tagging for Chinese language modeling

  • Author

    Law, Hubert Hin-Cheung ; Chan, Chorlcin

  • Author_Institution
    Dept. of Comput. Sci., Hong Kong Univ., Hong Kong
  • Volume
    1
  • fYear
    1996
  • fDate
    7-10 May 1996
  • Firstpage
    196
  • Abstract
    A novel ergodic multigram hidden Markov model (HMM) is introduced which models sentence production as a doubly stochastic process, in which word classes are first produced according to a first order Markov model, and then single or multi-character words are generated independently based on the word classes, without word boundary marked on the sentence. This model can be applied to languages without word boundary markers such as Chinese. With a lexicon containing syntactic classes for each word, its applications include language modeling for recognizers, and integrated word segmentation and class tagging. Pre-segmented and tagged corpus are not needed for training, and both segmentation and tagging are trained in one single model. In this paper, relevant algorithms for this model are presented, and experimental results on a Chinese news corpus are reported
  • Keywords
    hidden Markov models; natural languages; speech recognition; stochastic processes; Chinese language modeling; Chinese news corpus; class tagging; doubly stochastic process; ergodic multigram HMM; hidden Markov model; lexicon; multi-character words; sentence production; single character words; syntactic classes; word segmentation; Computer science; Hidden Markov models; Lattices; Maximum likelihood decoding; Natural languages; Production; Stochastic processes; Tagging; Terminology; Viterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-3192-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1996.540324
  • Filename
    540324