Title :
Speech Enhancement Using Generative Dictionary Learning
Author :
Sigg, Christian D. ; Dikk, Tomas ; Buhmann, Joachim M.
Author_Institution :
Swiss Fed. Office of Meteorol. & Climatology (MeteoSwiss), Zurich, Switzerland
Abstract :
The enhancement of speech degraded by real-world interferers is a highly relevant and difficult task. Its importance arises from the multitude of practical applications, whereas the difficulty is due to the fact that interferers are often nonstationary and potentially similar to speech. The goal of monaural speech enhancement is to separate a single mixture into its underlying clean speech and interferer components. This under-determined problem is solved by incorporating prior knowledge in the form of learned speech and interferer dictionaries. The clean speech is recovered from the degraded speech by sparse coding of the mixture in a composite dictionary consisting of the concatenation of a speech and an interferer dictionary. Enhancement performance is measured using objective measures and is limited by two effects. A too sparse coding of the mixture causes the speech component to be explained with too few speech dictionary atoms, which induces an approximation error we denote source distortion. However, a too dense coding of the mixture results in source confusion, where parts of the speech component are explained by interferer dictionary atoms and vice-versa. Our method enables the control of the source distortion and source confusion trade-off, and therefore achieves superior performance compared to powerful approaches like geometric spectral subtraction and codebook-based filtering, for a number of challenging interferer classes such as speech babble and wind noise.
Keywords :
filtering theory; learning (artificial intelligence); speech coding; speech enhancement; codebook-based filtering; generative dictionary learning; geometric spectral subtraction; interferer components; interferers; monaural speech enhancement; source distortion; speech component; Dictionaries; Prototypes; Speech; Speech coding; Speech enhancement; Time domain analysis; Dictionary learning; sparse coding; speech enhancement;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2187194