DocumentCode :
1305316
Title :
Combining Speech Fragment Decoding and Adaptive Noise Floor Modeling
Author :
Ma, Ning ; Barker, Jon ; Christensen, Heidi ; Green, Phil
Author_Institution :
Dept. of Comput. Sci., Univ. of Sheffield, Sheffield, UK
Volume :
20
Issue :
3
fYear :
2012
fDate :
3/1/2012 12:00:00 AM
Firstpage :
818
Lastpage :
827
Abstract :
This paper presents a novel noise-robust automatic speech recognition (ASR) system that combines aspects of the noise modeling and source separation approaches to the problem. The combined approach has been motivated by the observation that the noise backgrounds encountered in everyday listening situations can be roughly characterized as a slowly varying noise floor in which there are embedded a mixture of energetic but unpredictable acoustic events. Our solution combines two complementary techniques. First, an adaptive noise floor model estimates the degree to which high-energy acoustic events are masked by the noise floor (represented by a soft missing data mask). Second, a fragment decoding system attempts to interpret the high-energy regions that are not accounted for by the noise floor model. This component uses models of the target speech to decide whether fragments should be included in the target speech stream or not. Our experiments on the CHiME corpus task show that the combined approach performs significantly better than systems using either the noise model or fragment decoding approach alone, and substantially outperforms multicondition training.
Keywords :
decoding; speech coding; CHiME corpus task; adaptive noise floor modeling; fragment decoding system; high-energy acoustic events; high-energy regions; noise-robust ASR system; noise-robust automatic speech recognition system; source separation approach; speech fragment decoding; speech stream; Adaptation models; Data models; Decoding; Noise; Reliability; Speech; Speech recognition; Adaptive noise floor modeling; fragment decoding; missing data decoding; noise robust speech recognition;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2165945
Filename :
5995287
Link To Document :
بازگشت