DocumentCode :
2773723
Title :
Querying representative points from a pool based on synthesized queries
Author :
Hu, Xuelei ; Wang, Liantao ; Yuan, Bo
Author_Institution :
Sch. of Comput. Sci. & Technol., Nanjing Univ. of Sci. & Technol., Nanjing, China
fYear :
2012
fDate :
10-15 June 2012
Firstpage :
1
Lastpage :
6
Abstract :
How to build a compact and informative training data set autonomously is crucial for many real-world learning tasks, especially those with large amount of unlabeled data and high cost of labeling. Active learning aims to address this problem by asking queries in a smart way. Two main scenarios of querying considered in the literature are query synthesis and pool-based sampling. Since in many cases synthesized queries are meaningless or difficult for human to label, more efforts have been devoted to pool-based sampling in recent years. However, in pool-based active learning, querying requires evaluating every unlabeled data point in the pool, which is usually very time-consuming. By contrast, query synthesis has clear advantage on querying time, which is independent of the pool size. In this paper, we propose a novel framework combining query synthesis and pool-based sampling to accelerate the learning process and overcome the current limitation of query synthesis. The basic idea is to select the data point nearest to the synthesized query as the query point. We also provide two simple strategies for synthesizing informative queries. Moreover, to further speed up querying, we employ clustering techniques on the whole data set to construct a representative unlabeled data pool based on cluster centers. Experiments on several real-world data sets show that our methods have distinct advantages in time complexity and similar performance compared to pool-based uncertainty sampling methods.
Keywords :
computational complexity; learning (artificial intelligence); pattern clustering; query processing; cluster centers; clustering techniques; compact training data set; data point nearest selection; informative query synthesis; informative training data set; pool-based active learning; pool-based uncertainty sampling methods; representative point querying; representative unlabeled data pool; time complexity; Accuracy; Approximation algorithms; Educational institutions; Humans; Learning systems; Nearest neighbor searches; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location :
Brisbane, QLD
ISSN :
2161-4393
Print_ISBN :
978-1-4673-1488-6
Electronic_ISBN :
2161-4393
Type :
conf
DOI :
10.1109/IJCNN.2012.6252607
Filename :
6252607
Link To Document :
بازگشت