DocumentCode :
2330110
Title :
A collective data generation method for speech language models
Author :
Liu, Sean ; Seneff, Stephanie ; Glass, James
Author_Institution :
Comput. Sci. & Artificial Intell. Lab., MIT, Cambridge, MA, USA
fYear :
2010
fDate :
12-15 Dec. 2010
Firstpage :
223
Lastpage :
228
Abstract :
Recently we began using Amazon Mechanical Turk (AMT), an Internet marketplace, to deploy our spoken dialogue systems to large audiences for user testing and data collection purposes. This crowdsourcing method of collecting data contrasts with the time- and labor- intensive developer annotation methods. In this paper, we compare these data in various combinations with traditionally-collected corpora for training our speech recognizer´s language model. Our results show that AMT text queries are effective for initial language model training for spoken dialogue systems, and that crowd-sourced speech collection within the context of a spoken dialogue framework provides significant improvement.
Keywords :
data handling; interactive systems; speech recognition; Amazon mechanical turk; collective data generation method; crowdsourcing method; speech language model; speech recognizer language model; spoken dialogue systems; time-and labor-intensive developer annotation method; Amazon Mechanical Turk; Language models; crowdsourcing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2010 IEEE
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-7904-7
Electronic_ISBN :
978-1-4244-7902-3
Type :
conf
DOI :
10.1109/SLT.2010.5700855
Filename :
5700855
Link To Document :
بازگشت