Representing speech

Author

Kleijn, W.Bastiaan

Author_Institution

Department of Speech, Music and Hearing, KTH (Royal Institute of Technology), 100 44 Stockholm, Sweden

fYear

2000

fDate

4-8 Sept. 2000

Firstpage

Lastpage

Abstract

The properties of the speech production process and the auditory periphery have led to the usage of similar speech signal representations for various processing tasks such as speech and speaker recognition, speech synthesis, and speech coding. The representation is generally divided into a description of the vocal-tract transfer function and the excitation source. For recognition purposes, the biased characterization of the vocal-tract transfer function by a time sequence of low-dimension cepstral vectors performs well. For coding and synthesis, we argue that for the vocal-tract transfer function autoregressive (AR) models are more effective than filter banks, while for the excitation source pitch-synchronous filter banks and modulation-domain filters are most effective. A clear trend exists towards the exploitation of the time variation of both the vocal-tract transfer function and the excitation source.

Keywords

Mel frequency cepstral coefficient; Modulation; Speech; Speech processing; Speech recognition; Transfer functions;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Conference, 2000 10th European

Conference_Location

Tampere, Finland

Print_ISBN

978-952-1504-43-3

Type

conf

Filename

7075841

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=696995