• DocumentCode
    2601146
  • Title

    A 0.75 kbps speech codec using recognition and synthesis schemes

  • Author

    Heng-Chou Chen ; Chin-Yung Chen ; Tsou, Kui-Ming ; Chen, Heng-Chou

  • Author_Institution
    Dept. of Electr. Eng., Nat. Chung-Hsing Univ., Chia-Yi, Taiwan
  • fYear
    1997
  • fDate
    7-10 Sep 1997
  • Firstpage
    27
  • Lastpage
    28
  • Abstract
    In this paper, we proposed a very low bit-rate speech codec using recognition and synthesis schemes. The 2512 speech units, including 48 phones and 2463 diphones are utilized in the recognition process. The three-state continuous hidden Markov model, excluding the start and final states, is applied to model these speech units. In addition to the recognized phonetic index, the corresponding phonetic frame length is also the compressed information. In order to obtain a better quality of the reconstructed speech, pitch periods and pitch gains are realized to preserve the speaker´s personal characteristics. In the synthesis process, the time-domain pitch-synchronous overlapped addition scheme is utilized to synthesize a high-quality speech waveform. In our tests, a more than 90% recognition accuracy can be achieved when the user speaks in a normal behavior. The reconstructed speech quality can be above a mean opinion score of 3.0, and a diagnostic rhyme test score of 92
  • Keywords
    hidden Markov models; speech codecs; speech coding; speech recognition; speech synthesis; time-domain analysis; 0.75 kbit/s; compressed information; diagnostic rhyme test score; diphones; high-quality speech waveform; mean opinion score; personal characteristics; phones; phonetic frame length; phonetic index; pitch gains; pitch period; recognition; reconstructed speech quality; speech units; synthesis; three-state continuous hidden Markov model; time-domain pitch-synchronous overlapped addition scheme; very low bit-rate speech codec; Hidden Markov models; Linear predictive coding; Signal processing; Signal synthesis; Speech codecs; Speech coding; Speech processing; Speech recognition; Speech synthesis; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Coding For Telecommunications Proceeding, 1997, 1997 IEEE Workshop on
  • Conference_Location
    Pocono Manor, PA
  • Print_ISBN
    0-7803-4073-6
  • Type

    conf

  • DOI
    10.1109/SCFT.1997.623879
  • Filename
    623879