Title :
Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement
Author :
Toda, Tomoki ; Nakagiri, Mikihiro ; Shikano, Kiyohiro
Author_Institution :
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
Abstract :
In this paper, we present statistical approaches to enhance body-conducted unvoiced speech for silent speech communication. A body-conductive microphone called nonaudible murmur (NAM) microphone is effectively used to detect very soft unvoiced speech such as NAM or a whispered voice while keeping speech sounds emitted outside almost inaudible. However, body-conducted unvoiced speech is difficult to use in human-to-human speech communication because it sounds unnatural and less intelligible owing to the acoustic change caused by body conduction. To address this issue, voice conversion (VC) methods from NAM to normal speech (NAM-to-Speech) and to a whispered voice (NAM-to-Whisper) are proposed, where the acoustic features of body-conducted unvoiced speech are converted into those of natural voices in a probabilistic manner using Gaussian mixture models (GMMs). Moreover, these methods are extended to convert not only NAM but also a body-conducted whispered voice (BCW) as another type of body-conducted unvoiced speech. Several experimental evaluations are conducted to demonstrate the effectiveness of the proposed methods. The experimental results show that 1) NAM-to-Speech effectively improves intelligibility but it causes degradation of naturalness owing to the difficulty of estimating natural fundamental frequency contours from unvoiced speech; 2) NAM-to-Whisper significantly outperforms NAM-to-Speech in terms of both intelligibility and naturalness; and 3) a single conversion model capable of converting both NAM and BCW is effectively developed in our proposed VC methods.
Keywords :
Gaussian processes; frequency estimation; microphones; speech enhancement; speech intelligibility; statistical analysis; BCW; GMM; Gaussian mixture models; NAM microphone; NAM-to-speech intelligibility; VC methods; body-conducted unvoiced speech enhancement; body-conducted whispered voice; body-conductive microphone; natural fundamental frequency contour estimation; nonaudible murmur microphone; silent speech communication; soft unvoiced speech detection; statistical approaches; statistical voice conversion techniques; Acoustics; Feature extraction; Microphones; Speech; Speech enhancement; Vectors; Silent speech; body-conducted unvoiced speech; nonaudible murmur; voice conversion; whispered voice;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2205241