• DocumentCode
    394358
  • Title

    Improving the performance of HMM-based very low bit rate speech coding

  • Author

    Hoshiya, Takahiro ; Sako, Shinji ; Zen, Heiga ; Tokuda, Keiichi ; Masuko, Takashi ; Kobayashi, Takao ; Tadashi Kitantura

  • Author_Institution
    Dept. of Comput. Sci., Nagoya Inst. of Technol., Japan
  • Volume
    1
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    In this paper, we define an F0 quantization scheme for a very low bit rate speech coder based on HMM (hidden Markov model). In the coding system, the encoder carries out phoneme recognition, and transmits phoneme indices, state durations and F0 information to the decoder. In the decoder, phoneme HMM are concatenated according to the phoneme indices, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM. Finally we obtain synthetic speech by using the MLSA (mel log spectrum approximation) filter according to the mel-cepstral coefficients and F0 information. In addition to the F0 quantization, we investigate encoding methods for other parameters to reduce the bit rate, yet keeping the subjective speech quality. A subjective listening test shows that the performance of the proposed coder at about 100∼150 bit/s is superior to a VQ-based vocoder at 600 bit/s (mel-cepstrum: 6 bit/frame×50 frame/s, F0: 6 bit/frame×50 frame/s).
  • Keywords
    cepstral analysis; decoding; hidden Markov models; speech coding; speech recognition; speech synthesis; vocoders; F0 quantization scheme; MLSA filter; decoder; encoder; encoding methods; hidden Markov model; mel log spectrum approximation; mel-cepstral coefficient vectors; performance; phoneme HMM concatenation; phoneme indices; phoneme recognition; speech coder; state durations; subjective listening test; subjective speech quality; synthetic speech; very low bit rate speech coding; Bit rate; Concatenated codes; Decoding; Encoding; Hidden Markov models; Information filtering; Information filters; Quantization; Speech coding; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1198902
  • Filename
    1198902