DocumentCode :
679523
Title :
Maximizing Expected Model Change for Active Learning in Regression
Author :
Wenbin Cai ; Ya Zhang ; Jun Zhou
Author_Institution :
Shanghai Key Lab. of Multimedia Process. & Transmissions, Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
51
Lastpage :
60
Abstract :
Active learning is well-motivated in many supervised learning tasks where unlabeled data may be abundant but labeled examples are expensive to obtain. The goal of active learning is to maximize the performance of a learning model using as few labeled training data as possible, thereby minimizing the cost of data annotation. So far, there is still very limited work on active learning for regression. In this paper, we propose a new active learning framework for regression called Expected Model Change Maximization (EMCM), which aims to choose the examples that lead to the largest change to the current model. The model change is measured as the difference between the current model parameters and the updated parameters after training with the enlarged training set. Inspired by the Stochastic Gradient Descent (SGD) update rule, the change is estimated as the gradient of the loss with respect to a candidate example for active learning. Under this framework, we derive novel active learning algorithms for both linear regression and nonlinear regression to select the most informative examples. Extensive experimental results on the benchmark data sets from UCI machine learning repository have demonstrated that the proposed algorithms are highly effective in choosing the most informative examples and robust to various types of data distributions.
Keywords :
expectation-maximisation algorithm; gradient methods; learning (artificial intelligence); regression analysis; stochastic processes; EMCM; SGD update rule; UCI machine learning repository; active learning algorithms; active learning framework; data annotation; data distributions; enlarged training set; expected model change maximization; labeled training data; nonlinear regression; stochastic gradient descent update rule; supervised learning; unlabeled data; Current measurement; Data models; Linear regression; Machine learning algorithms; Regression tree analysis; Training; Training data; Active learning; Expected Model Change Maximization; Linear Regression; Nonlinear regression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
ISSN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2013.104
Filename :
6729489
Link To Document :
بازگشت