DocumentCode
3131819
Title
Crowdsourcing the acquisition of natural language corpora: Methods and observations
Author
Wang, Wei Yu ; Bohus, D. ; Kamar, E. ; Horvitz, Eric
Author_Institution
Microsoft Res., Redmond, WA, USA
fYear
2012
fDate
2-5 Dec. 2012
Firstpage
73
Lastpage
78
Abstract
We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.
Keywords
natural language processing; crowdsourcing methods; frame semantics; language corpora; list-based descriptions; natural language processing systems; natural language sentences; semantic biases; semantic correctness; semantic naturalness; Grammar; Humans; Natural languages; Ontologies; Remuneration; Semantics; Sensitivity; crowdsourcing; language understanding; natural language elicitation methods; spoken dialog;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location
Miami, FL
Print_ISBN
978-1-4673-5125-6
Electronic_ISBN
978-1-4673-5124-9
Type
conf
DOI
10.1109/SLT.2012.6424200
Filename
6424200
Link To Document