Crowdsourcing the acquisition of natural language corpora: Methods and observations

Author

Wang, Wei Yu ; Bohus, D. ; Kamar, E. ; Horvitz, Eric

Author_Institution

Microsoft Res., Redmond, WA, USA

fYear

2012

fDate

2-5 Dec. 2012

Firstpage

73

Lastpage

78

Abstract

We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.

Keywords

natural language processing; crowdsourcing methods; frame semantics; language corpora; list-based descriptions; natural language processing systems; natural language sentences; semantic biases; semantic correctness; semantic naturalness; Grammar; Humans; Natural languages; Ontologies; Remuneration; Semantics; Sensitivity; crowdsourcing; language understanding; natural language elicitation methods; spoken dialog;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language Technology Workshop (SLT), 2012 IEEE

Conference_Location

Miami, FL

Print_ISBN

978-1-4673-5125-6

Electronic_ISBN

978-1-4673-5124-9

Type

conf

DOI

10.1109/SLT.2012.6424200

Filename

6424200