A 0.75 kbps speech codec using recognition and synthesis schemes

Author

Heng-Chou Chen ; Chin-Yung Chen ; Tsou, Kui-Ming ; Chen, Heng-Chou

Author_Institution

Dept. of Electr. Eng., Nat. Chung-Hsing Univ., Chia-Yi, Taiwan

fYear

1997

fDate

7-10 Sep 1997

Firstpage

27

Lastpage

28

Abstract

In this paper, we proposed a very low bit-rate speech codec using recognition and synthesis schemes. The 2512 speech units, including 48 phones and 2463 diphones are utilized in the recognition process. The three-state continuous hidden Markov model, excluding the start and final states, is applied to model these speech units. In addition to the recognized phonetic index, the corresponding phonetic frame length is also the compressed information. In order to obtain a better quality of the reconstructed speech, pitch periods and pitch gains are realized to preserve the speaker´s personal characteristics. In the synthesis process, the time-domain pitch-synchronous overlapped addition scheme is utilized to synthesize a high-quality speech waveform. In our tests, a more than 90% recognition accuracy can be achieved when the user speaks in a normal behavior. The reconstructed speech quality can be above a mean opinion score of 3.0, and a diagnostic rhyme test score of 92

Keywords

hidden Markov models; speech codecs; speech coding; speech recognition; speech synthesis; time-domain analysis; 0.75 kbit/s; compressed information; diagnostic rhyme test score; diphones; high-quality speech waveform; mean opinion score; personal characteristics; phones; phonetic frame length; phonetic index; pitch gains; pitch period; recognition; reconstructed speech quality; speech units; synthesis; three-state continuous hidden Markov model; time-domain pitch-synchronous overlapped addition scheme; very low bit-rate speech codec; Hidden Markov models; Linear predictive coding; Signal processing; Signal synthesis; Speech codecs; Speech coding; Speech processing; Speech recognition; Speech synthesis; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Coding For Telecommunications Proceeding, 1997, 1997 IEEE Workshop on

Conference_Location

Pocono Manor, PA

Print_ISBN

0-7803-4073-6

Type

conf

DOI

10.1109/SCFT.1997.623879

Filename

623879