Title :
Analyzing quality of crowd-sourced speech transcriptions of noisy audio for acoustic model adaptation
Author :
Audhkhasi, Kartik ; Georgiou, Panayiotis G. ; Narayanan, Shrikanth S.
Author_Institution :
Signal Anal. & Interpretation Lab. (SAIL), Univ. of Southern California, Los Angeles, CA, USA
Abstract :
The accuracy of crowd-sourced speech transcriptions varies depending on a variety of factors. This paper studies the impact of one such factor, namely, the quality of audio. We employed a speech database with babble noise at three SNR levels (clean, 2 dB and -2 dB) and asked workers on Amazon Mechanical Turk to transcribe it. Two interesting observations emerge. First, as expected, the quality of transcripts combined by word frequency based ROVER decreases with decreasing SNR. Further, we demonstrate that the use of some unsupervised reliability scores can improve the transcription quality, with increasing benefits at lower SNR. Second, we do not observe a significant drop in the performance of acoustic models adapted with increasing transcription noise. This highlights the surprising robustness of crowd-sourced transcripts for acoustic model adaptation.
Keywords :
reliability; speech recognition; Amazon Mechanical Turk; ROVER; SNR levels; acoustic model adaptation; audio quality; babble noise; crowd-sourced speech transcriptions; noisy audio; speech database; unsupervised reliability; Acoustics; Adaptation models; Error analysis; Noise; Noise measurement; Reliability; Speech; Crowd-sourcing; automatic speech recognition; speech transcription;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288829