DocumentCode
3162067
Title
Analyzing quality of crowd-sourced speech transcriptions of noisy audio for acoustic model adaptation
Author
Audhkhasi, Kartik ; Georgiou, Panayiotis G. ; Narayanan, Shrikanth S.
Author_Institution
Signal Anal. & Interpretation Lab. (SAIL), Univ. of Southern California, Los Angeles, CA, USA
fYear
2012
fDate
25-30 March 2012
Firstpage
4137
Lastpage
4140
Abstract
The accuracy of crowd-sourced speech transcriptions varies depending on a variety of factors. This paper studies the impact of one such factor, namely, the quality of audio. We employed a speech database with babble noise at three SNR levels (clean, 2 dB and -2 dB) and asked workers on Amazon Mechanical Turk to transcribe it. Two interesting observations emerge. First, as expected, the quality of transcripts combined by word frequency based ROVER decreases with decreasing SNR. Further, we demonstrate that the use of some unsupervised reliability scores can improve the transcription quality, with increasing benefits at lower SNR. Second, we do not observe a significant drop in the performance of acoustic models adapted with increasing transcription noise. This highlights the surprising robustness of crowd-sourced transcripts for acoustic model adaptation.
Keywords
reliability; speech recognition; Amazon Mechanical Turk; ROVER; SNR levels; acoustic model adaptation; audio quality; babble noise; crowd-sourced speech transcriptions; noisy audio; speech database; unsupervised reliability; Acoustics; Adaptation models; Error analysis; Noise; Noise measurement; Reliability; Speech; Crowd-sourcing; automatic speech recognition; speech transcription;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location
Kyoto
ISSN
1520-6149
Print_ISBN
978-1-4673-0045-2
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2012.6288829
Filename
6288829
Link To Document