Augmented speech production based on real-time statistical voice conversion

Author

Toda, Tomoki

Author_Institution

Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan

fYear

2014

fDate

3-5 Dec. 2014

Firstpage

592

Lastpage

596

Abstract

In human-to-human speech communication, various barriers are caused by some constraints, such as physical constraints causing vocal disorders and environmental constraints making it hard to produce intelligible speech. These barriers would be overcome if our speech production was augmented so that we could produce speech sounds as we want beyond these constraints. Voice conversion (VC) is a technique for modifying speech acoustics, converting non-/para-linguistic information to any form we want while preserving the linguistic content. One of the most popular approaches to VC is based on statistical processing, which is capable of extracting a complex conversion function in a data-driven manner. Although this technique was originally studied in the context of speaker conversion, which converts the voice of a certain speaker to sound like that of another specific speaker, it has great potential to achieve various applications beyond speaker conversion. This paper briefly reviews a trajectory-based conversion method that is capable of effectively reproducing natural speech parameter trajectories utterance by utterance and highlights several techniques that extend this trajectory-based conversion method to achieve real-time conversion processing. Finally this paper shows some examples of real-time VC applications to enhance human-to-human speech communication, such as speaking-aid, silent speech communication, and voice changer/vocal effector.

Keywords

speaker recognition; speech processing; statistical analysis; augmented speech production; complex conversion function extraction; data-driven method; environmental constraints; human-to-human speech communication enhancement; intelligible speech production; linguistic content preservation; natural speech parameter trajectory reproduction; nonlinguistic information; paralinguistic information; physical constraints; real-time VC applications; real-time conversion processing; real-time statistical voice conversion; silent speech communication; speaker conversion; speaking-aid; speech acoustics; speech sound production; statistical processing; trajectory-based conversion method; vocal disorders; vocal effector; voice changer; Hidden Markov models; Real-time systems; Speech; Speech enhancement; Vectors; augmented speech production; human-to-human speech communication enhancement; real-time processing; statistical voice conversion;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on

Conference_Location

Atlanta, GA

Type

conf

DOI

10.1109/GlobalSIP.2014.7032186

Filename

7032186

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=257809