DocumentCode :
3697408
Title :
Audio super-resolution using concatenative resynthesis
Author :
Michael I Mandel;Young Suk Cho
Author_Institution :
Brooklyn College, CUNY, Computer &
fYear :
2015
Firstpage :
1
Lastpage :
5
Abstract :
This paper utilizes a recently introduced non-linear dictionary-based denoising system in another voice mapping task, that of transforming low-bandwidth, low-bitrate speech into high-bandwidth, high-quality speech. The system uses a deep neural network as a learned nonlinear comparison function to drive unit selection in a concatenative synthesizer based on clean recordings. This neural network is trained to predict whether a given clean audio segment from the dictionary could be transformed into a given segment of the degraded observation. Speaker-dependent experiments on the small-vocabulary CHiME2-GRID corpus show that this model is able to resynthesize high quality clean speech from degraded observations. Preliminary listening tests show that the system is able to improve subjective speech quality evaluations by up to 50 percentage points, while a similar system based on non-negative matrix factorization and trained on the same data produces no significant improvement.
Keywords :
"Speech","Dictionaries","Neural networks","Speech processing","Bandwidth","Packet loss"
Publisher :
ieee
Conference_Titel :
Applications of Signal Processing to Audio and Acoustics (WASPAA), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/WASPAA.2015.7336890
Filename :
7336890
Link To Document :
بازگشت