DocumentCode :
27613
Title :
Active Learning With Imbalanced Multiple Noisy Labeling
Author :
Jing Zhang ; Xindong Wu ; Shengs, Victor S.
Author_Institution :
Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
Volume :
45
Issue :
5
fYear :
2015
fDate :
May-15
Firstpage :
1081
Lastpage :
1093
Abstract :
With crowdsourcing systems, it is easy to collect multiple noisy labels for the same object for supervised learning. This dynamic annotation procedure fits the active learning perspective and accompanies the imbalanced multiple noisy labeling problem. This paper proposes a novel active learning framework with multiple imperfect annotators involved in crowdsourcing systems. The framework contains two core procedures: label integration and instance selection. In the label integration procedure, a positive label threshold (PLAT) algorithm is introduced to induce the class membership from the multiple noisy label set of each instance in a training set. PLAT solves the imbalanced labeling problem by dynamically adjusting the threshold for determining the class membership of an example. Furthermore, three novel instance selection strategies are proposed to adapt PLAT for improving the learning performance. These strategies are respectively based on the uncertainty derived from the multiple labels, the uncertainty derived from the learned model, and the combination method (CFI). Experimental results on 12 datasets with different underlying class distributions demonstrate that the three novel instance selection strategies significantly improve the learning performance, and CFI has the best performance when labeling behaviors exhibit different levels of imbalance in crowdsourcing systems. We also apply our methods to a real-world scenario, obtaining noisy labels from Amazon Mechanical Turk, and show that our proposed strategies achieve very high performance.
Keywords :
data integration; learning (artificial intelligence); pattern classification; Amazon Mechanical Turk; CFI; PLAT algorithm; active learning framework; class distributions; class membership; combination method; crowdsourcing systems; dynamic annotation procedure; imbalanced multiple noisy labeling; instance selection strategies; label integration; learning performance; multiple noisy label set; positive label threshold algorithm; supervised learning; training set; Accuracy; Crowdsourcing; Labeling; Measurement uncertainty; Noise measurement; Training; Uncertainty; Active learning; crowdsourcing; imbalanced learning; repeated labeling; supervised classification;
fLanguage :
English
Journal_Title :
Cybernetics, IEEE Transactions on
Publisher :
ieee
ISSN :
2168-2267
Type :
jour
DOI :
10.1109/TCYB.2014.2344674
Filename :
6878424
Link To Document :
بازگشت