• DocumentCode
    253049
  • Title

    Active learning from uncertain crowd annotations

  • Author

    Yan Yan ; Rosales, R. ; Fung, G. ; Dy, J.

  • Author_Institution
    Northeastern Univ. & is now at Yahoo! Labs., Sunnyvale, CA, USA
  • fYear
    2014
  • fDate
    Sept. 30 2014-Oct. 3 2014
  • Firstpage
    385
  • Lastpage
    392
  • Abstract
    Supervised learning means there is a teacher providing labels given data samples, and the goal is to predict the labels of unseen instances. In general, these labelers may make mistakes. Typical learning methods rely on an often overlooked assumption that a single expert can provide the required supervision; however, it is becoming more common for supervision to be available in many forms as data can be shared and processed by increasingly larger audiences. This makes it possible for not just one but many labelers to offer some forms of supervision (this phenomena is coined as crowdsourcing). Some annotators may be more reliable than others, malicious, or may be correlated with others. Annotator effectiveness may vary depending on the data instance presented. We utilize a probabilistic model for learning a classifier from multiple annotators, where the reliability of the annotators may vary with the annotator and the data that they observe. Although we may have access to many annotators, it is still expensive to label and not all annotators have the same level of expertise. The general problem of intelligently choosing instances for labeling is known as active learning. The crowdsourcing paradigm posits new challenges to active learning - not only are we interested in which sample to label next but also which annotator should be queried to benefit our learning model the most. This paper presents different approaches for performing active learning in the crowdsourcing setting.
  • Keywords
    graph theory; learning (artificial intelligence); outsourcing; pattern classification; probability; active learning; crowd annotation; crowdsourcing; data classifier; graphical model; probabilistic model; Data models; Equations; Labeling; Mathematical model; Reliability; Training; Uncertainty; Active Learning; Adversarial Annotators; Classification; Crowd Sourcing; Graphical Models; Multiple Annotation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on
  • Conference_Location
    Monticello, IL
  • Type

    conf

  • DOI
    10.1109/ALLERTON.2014.7028481
  • Filename
    7028481