DocumentCode
62822
Title
Whisper-to-speech conversion using restricted Boltzmann machine arrays
Author
Jing-jie Li ; McLoughlin, Ian Vince ; Li-Rong Dai ; Zhen-Hua Ling
Author_Institution
Univ. of Sci. & Technol. of China, Hefei, China
Volume
50
Issue
24
fYear
2014
fDate
11 20 2014
Firstpage
1781
Lastpage
1782
Abstract
Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal-induced pitch leads to low energy, and an inherent noise-like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper-reconstructed speech compared with the state-of-the-art approaches.
Keywords
Boltzmann machines; learning (artificial intelligence); speech intelligibility; speech processing; statistical analysis; Gaussian mixture model; RBM arrays; artificial muffle; glottal-induced pitch lead; human-to-human vocal communication mechanism; inherent noise-like spectral distribution; machine learning technique; pitch accuracy; pitch estimation; restricted Boltzmann machine array; speech intelligibility; speech reconstruction; speech spectral envelope; statistical conversion model; unnatural prosody; vocal cord; voiced region; whisper processing; whisper-to-speech conversion;
fLanguage
English
Journal_Title
Electronics Letters
Publisher
iet
ISSN
0013-5194
Type
jour
DOI
10.1049/el.2014.1645
Filename
6969246
Link To Document