DocumentCode :
2802323
Title :
A framework for parametric singing voice analysis/synthesis
Author :
Kim, Youngmoo E.
Author_Institution :
Media Lab., MIT, Cambridge, MA, USA
fYear :
2003
fDate :
19-22 Oct. 2003
Firstpage :
123
Lastpage :
126
Abstract :
The singing voice is the most variable and flexible of musical instruments. All voices are capable of producing the common phonemes necessary for language understanding and communication, yet each voice possesses distinctive qualities that are seemingly independent of phonemes and words. The unique acoustic qualities of an individual singer´s voice arise from a combination of innate physical factors (e.g., vocal tract and vocal fold physiology) and time-varying characteristics of performance (e.g., pronunciation and musical expression). This research introduces a framework for singing voice analysis/synthesis that takes both physical and expressive factors into account by estimating source-filter voice model parameters (representing the physiology) and modeling the dynamic behavior of these features over time using a hidden Markov model (to represent aspects of expression). Historically, source and filter model features have been calculated independently, but here they are estimated jointly for better modelling of source-filter dependencies common in singing. Additionally, the vocal tract filter is estimated on a warped frequency scale, which more accurately reflects the frequency sensitivity of human perception. This framework has many possible applications, including singing voice analysis/synthesis and singer identification.
Keywords :
audio signal processing; hidden Markov models; parameter estimation; speaker recognition; speech; audio signal processing; hidden Markov model; innate physical factors; musical expression; pronunciation; singer identification; singing voice analysis; singing voice synthesis; source-filter voice model parameter estimation; time-varying characteristics; vocal fold physiology; vocal tract physiology; Filters; Frequency estimation; Hidden Markov models; Humans; Instruments; Lips; Physiology; Shape; Speech analysis; Tongue;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on.
Print_ISBN :
0-7803-7850-4
Type :
conf
DOI :
10.1109/ASPAA.2003.1285835
Filename :
1285835
Link To Document :
بازگشت