Title :
Audio-visual voice conversion using noise-robust features
Author :
Sawada, Kazuaki ; Takehara, Masanori ; Tamura, Shinji ; Hayamizu, Satoru
Author_Institution :
Dept. of Eng., Gifu Univ., Gifu, Japan
Abstract :
Voice Conversion (VC) is a technique to convert speech data of source speaker into ones of target speaker. VC has been investigated and statistical VC is used for various purposes. Conventional VC uses acoustic features, however, the audio-only VC has suffered from the degradation in noisy or real environments. This paper proposes an AudioVisual VC (AVVC) method using not only audio features but also visual information, i.e. lip images. Eigenlip feature is employed in our scheme as visual feature. We also propose a feature selection approach for audio-visual features. Experiments were conducted to evaluate our AVVC scheme comparing with audio-only VC, using noisy data. The results show that AVVC can improve the performance even in noisy environments, by properly selecting audio and visual parameters. It is also found that visual VC is also successful. Furthermore, it is observed that visual dynamic features are more effective than visual static information.
Keywords :
audio coding; audio-visual systems; speaker recognition; speech processing; Eigenlip feature; acoustic features; audio-visual features; audio-visual voice conversion; feature selection; lip images; noise-robust features; source speaker; speech data; target speaker; visual information; visual static information; Acoustics; Feature extraction; Noise; Noise measurement; Speech; Speech recognition; Visualization; audio-visual processing; feature selection; noise robustness; voice conversion;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6855138