Improving the performance of HMM-based very low bit rate speech coding

Author

Hoshiya, Takahiro ; Sako, Shinji ; Zen, Heiga ; Tokuda, Keiichi ; Masuko, Takashi ; Kobayashi, Takao ; Tadashi Kitantura

Author_Institution

Dept. of Comput. Sci., Nagoya Inst. of Technol., Japan

Volume

1

fYear

2003

fDate

6-10 April 2003

Abstract

In this paper, we define an F0 quantization scheme for a very low bit rate speech coder based on HMM (hidden Markov model). In the coding system, the encoder carries out phoneme recognition, and transmits phoneme indices, state durations and F0 information to the decoder. In the decoder, phoneme HMM are concatenated according to the phoneme indices, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM. Finally we obtain synthetic speech by using the MLSA (mel log spectrum approximation) filter according to the mel-cepstral coefficients and F0 information. In addition to the F0 quantization, we investigate encoding methods for other parameters to reduce the bit rate, yet keeping the subjective speech quality. A subjective listening test shows that the performance of the proposed coder at about 100∼150 bit/s is superior to a VQ-based vocoder at 600 bit/s (mel-cepstrum: 6 bit/frame×50 frame/s, F0: 6 bit/frame×50 frame/s).

Keywords

cepstral analysis; decoding; hidden Markov models; speech coding; speech recognition; speech synthesis; vocoders; F0 quantization scheme; MLSA filter; decoder; encoder; encoding methods; hidden Markov model; mel log spectrum approximation; mel-cepstral coefficient vectors; performance; phoneme HMM concatenation; phoneme indices; phoneme recognition; speech coder; state durations; subjective listening test; subjective speech quality; synthetic speech; very low bit rate speech coding; Bit rate; Concatenated codes; Decoding; Encoding; Hidden Markov models; Information filtering; Information filters; Quantization; Speech coding; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1198902

Filename

1198902