• DocumentCode
    3131819
  • Title

    Crowdsourcing the acquisition of natural language corpora: Methods and observations

  • Author

    Wang, Wei Yu ; Bohus, D. ; Kamar, E. ; Horvitz, Eric

  • Author_Institution
    Microsoft Res., Redmond, WA, USA
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    73
  • Lastpage
    78
  • Abstract
    We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.
  • Keywords
    natural language processing; crowdsourcing methods; frame semantics; language corpora; list-based descriptions; natural language processing systems; natural language sentences; semantic biases; semantic correctness; semantic naturalness; Grammar; Humans; Natural languages; Ontologies; Remuneration; Semantics; Sensitivity; crowdsourcing; language understanding; natural language elicitation methods; spoken dialog;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2012 IEEE
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4673-5125-6
  • Electronic_ISBN
    978-1-4673-5124-9
  • Type

    conf

  • DOI
    10.1109/SLT.2012.6424200
  • Filename
    6424200