DocumentCode
257809
Title
Augmented speech production based on real-time statistical voice conversion
Author
Toda, Tomoki
Author_Institution
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
fYear
2014
fDate
3-5 Dec. 2014
Firstpage
592
Lastpage
596
Abstract
In human-to-human speech communication, various barriers are caused by some constraints, such as physical constraints causing vocal disorders and environmental constraints making it hard to produce intelligible speech. These barriers would be overcome if our speech production was augmented so that we could produce speech sounds as we want beyond these constraints. Voice conversion (VC) is a technique for modifying speech acoustics, converting non-/para-linguistic information to any form we want while preserving the linguistic content. One of the most popular approaches to VC is based on statistical processing, which is capable of extracting a complex conversion function in a data-driven manner. Although this technique was originally studied in the context of speaker conversion, which converts the voice of a certain speaker to sound like that of another specific speaker, it has great potential to achieve various applications beyond speaker conversion. This paper briefly reviews a trajectory-based conversion method that is capable of effectively reproducing natural speech parameter trajectories utterance by utterance and highlights several techniques that extend this trajectory-based conversion method to achieve real-time conversion processing. Finally this paper shows some examples of real-time VC applications to enhance human-to-human speech communication, such as speaking-aid, silent speech communication, and voice changer/vocal effector.
Keywords
speaker recognition; speech processing; statistical analysis; augmented speech production; complex conversion function extraction; data-driven method; environmental constraints; human-to-human speech communication enhancement; intelligible speech production; linguistic content preservation; natural speech parameter trajectory reproduction; nonlinguistic information; paralinguistic information; physical constraints; real-time VC applications; real-time conversion processing; real-time statistical voice conversion; silent speech communication; speaker conversion; speaking-aid; speech acoustics; speech sound production; statistical processing; trajectory-based conversion method; vocal disorders; vocal effector; voice changer; Hidden Markov models; Real-time systems; Speech; Speech enhancement; Vectors; augmented speech production; human-to-human speech communication enhancement; real-time processing; statistical voice conversion;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on
Conference_Location
Atlanta, GA
Type
conf
DOI
10.1109/GlobalSIP.2014.7032186
Filename
7032186
Link To Document