• DocumentCode
    1544068
  • Title

    Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement

  • Author

    Toda, Tomoki ; Nakagiri, Mikihiro ; Shikano, Kiyohiro

  • Author_Institution
    Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara, Japan
  • Volume
    20
  • Issue
    9
  • fYear
    2012
  • Firstpage
    2505
  • Lastpage
    2517
  • Abstract
    In this paper, we present statistical approaches to enhance body-conducted unvoiced speech for silent speech communication. A body-conductive microphone called nonaudible murmur (NAM) microphone is effectively used to detect very soft unvoiced speech such as NAM or a whispered voice while keeping speech sounds emitted outside almost inaudible. However, body-conducted unvoiced speech is difficult to use in human-to-human speech communication because it sounds unnatural and less intelligible owing to the acoustic change caused by body conduction. To address this issue, voice conversion (VC) methods from NAM to normal speech (NAM-to-Speech) and to a whispered voice (NAM-to-Whisper) are proposed, where the acoustic features of body-conducted unvoiced speech are converted into those of natural voices in a probabilistic manner using Gaussian mixture models (GMMs). Moreover, these methods are extended to convert not only NAM but also a body-conducted whispered voice (BCW) as another type of body-conducted unvoiced speech. Several experimental evaluations are conducted to demonstrate the effectiveness of the proposed methods. The experimental results show that 1) NAM-to-Speech effectively improves intelligibility but it causes degradation of naturalness owing to the difficulty of estimating natural fundamental frequency contours from unvoiced speech; 2) NAM-to-Whisper significantly outperforms NAM-to-Speech in terms of both intelligibility and naturalness; and 3) a single conversion model capable of converting both NAM and BCW is effectively developed in our proposed VC methods.
  • Keywords
    Gaussian processes; frequency estimation; microphones; speech enhancement; speech intelligibility; statistical analysis; BCW; GMM; Gaussian mixture models; NAM microphone; NAM-to-speech intelligibility; VC methods; body-conducted unvoiced speech enhancement; body-conducted whispered voice; body-conductive microphone; natural fundamental frequency contour estimation; nonaudible murmur microphone; silent speech communication; soft unvoiced speech detection; statistical approaches; statistical voice conversion techniques; Acoustics; Feature extraction; Microphones; Speech; Speech enhancement; Vectors; Silent speech; body-conducted unvoiced speech; nonaudible murmur; voice conversion; whispered voice;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2205241
  • Filename
    6220854