DocumentCode :
1690568
Title :
Mean temporal distance: Predicting ASR error from temporal properties of speech signal
Author :
Hermansky, Hynek ; Variani, Ehsan ; Peddinti, Vijayaditya
Author_Institution :
Center for Language & Speech Process., Johns Hopkins Univ., Baltimore, MD, USA
fYear :
2013
Firstpage :
7423
Lastpage :
7426
Abstract :
Extending previous work on prediction of phoneme recognition error from unlabeled data that were corrupted by unpredictable factors, the current work investigates a simple but effective method of estimating ASR performance by computing a function M(Δt), which represents the mean distance between speech feature vectors evaluated over certain finite time interval, determined as a function of temporal distance Δt between the vectors. It is shown that M(Δt) is a function of signal-to-noise ratio of speech signal. Comparing M(Δt) curves, derived on data used for training of the classifier, and on test utterances, allows for predicting error on the test data. Another interesting observation is that M(Δt) remains approximately constant, as temporal separation Δt exceeds certain critical interval (about 200 ms), indicating the extent of coarticulation in speech sounds.
Keywords :
distortion; pattern classification; speech recognition; vectors; ASR error; automatic speech recognition; finite time interval; mean temporal distance; phoneme classification; phoneme recognition error; signal-to-noise ratio; speech feature vectors; speech signal; speech sounds; temporal separation; test utterances; Correlation; Databases; Noise; Noise measurement; Speech; Speech recognition; Vectors; automatic recognition of speech; error-rate prediction on unknown data; phoneme classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639105
Filename :
6639105
Link To Document :
بازگشت