Active learning from uncertain crowd annotations

Author

Yan Yan ; Rosales, R. ; Fung, G. ; Dy, J.

Author_Institution

Northeastern Univ. & is now at Yahoo! Labs., Sunnyvale, CA, USA

fYear

2014

fDate

Sept. 30 2014-Oct. 3 2014

Firstpage

385

Lastpage

392

Abstract

Supervised learning means there is a teacher providing labels given data samples, and the goal is to predict the labels of unseen instances. In general, these labelers may make mistakes. Typical learning methods rely on an often overlooked assumption that a single expert can provide the required supervision; however, it is becoming more common for supervision to be available in many forms as data can be shared and processed by increasingly larger audiences. This makes it possible for not just one but many labelers to offer some forms of supervision (this phenomena is coined as crowdsourcing). Some annotators may be more reliable than others, malicious, or may be correlated with others. Annotator effectiveness may vary depending on the data instance presented. We utilize a probabilistic model for learning a classifier from multiple annotators, where the reliability of the annotators may vary with the annotator and the data that they observe. Although we may have access to many annotators, it is still expensive to label and not all annotators have the same level of expertise. The general problem of intelligently choosing instances for labeling is known as active learning. The crowdsourcing paradigm posits new challenges to active learning - not only are we interested in which sample to label next but also which annotator should be queried to benefit our learning model the most. This paper presents different approaches for performing active learning in the crowdsourcing setting.

Keywords

graph theory; learning (artificial intelligence); outsourcing; pattern classification; probability; active learning; crowd annotation; crowdsourcing; data classifier; graphical model; probabilistic model; Data models; Equations; Labeling; Mathematical model; Reliability; Training; Uncertainty; Active Learning; Adversarial Annotators; Classification; Crowd Sourcing; Graphical Models; Multiple Annotation;

fLanguage

English

Publisher

ieee

Conference_Titel

Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on

Conference_Location

Monticello, IL

Type

conf

DOI

10.1109/ALLERTON.2014.7028481

Filename

7028481