Title :
The Deterministic Plus Stochastic Model of the Residual Signal and Its Applications
Author :
Drugman, Thomas ; Dutoit, Thierry
Author_Institution :
TCTS Lab., Univ. of Mons, Mons, Belgium
fDate :
3/1/2012 12:00:00 AM
Abstract :
The modeling of speech production often relies on a source-filter approach. Although methods parameterizing the filter have nowadays reached a certain maturity, there is still a lot to be gained for several speech processing applications in finding an appropriate excitation model. This manuscript presents a Deterministic plus Stochastic Model (DSM) of the residual signal. The DSM consists of two contributions acting in two distinct spectral bands delimited by a maximum voiced frequency. Both components are extracted from an analysis performed on a speaker-dependent dataset of pitch-synchronous residual frames. The deterministic part models the low-frequency contents and arises from an orthonormal decomposition of these frames. As for the stochastic component, it is a high-frequency noise modulated both in time and frequency. Some interesting phonetic and computational properties of the DSM are also highlighted. The applicability of the DSM in two fields of speech processing is then studied. First, it is shown that incorporating the DSM vocoder in HMM-based speech synthesis enhances the delivered quality. The proposed approach turns out to significantly outperform the traditional pulse excitation and provides a quality equivalent to STRAIGHT. In a second application, the potential of glottal signatures derived from the proposed DSM is investigated for speaker identification purpose. Interestingly, these signatures are shown to lead to better recognition rates than other glottal-based methods.
Keywords :
filtering theory; hidden Markov models; speaker recognition; speech synthesis; HMM-based speech synthesis; appropriate excitation model; computational properties; deterministic plus stochastic model; glottal signatures; low-frequency contents; maximum voiced frequency; orthonormal decomposition; phonetic properties; pitch-synchronous residual frames; recognition rates; residual signal; source-filter approach; speaker identification purpose; speaker-dependent dataset; spectral bands; speech processing applications; speech production; traditional pulse excitation; Frequency modulation; Hidden Markov models; Principal component analysis; Speech; Speech synthesis; Stochastic processes; Speech analysis; excitation modeling; glottal flow; speaker recognition; speech synthesis;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2011.2169787