Title :
Querying representative points from a pool based on synthesized queries
Author :
Hu, Xuelei ; Wang, Liantao ; Yuan, Bo
Author_Institution :
Sch. of Comput. Sci. & Technol., Nanjing Univ. of Sci. & Technol., Nanjing, China
Abstract :
How to build a compact and informative training data set autonomously is crucial for many real-world learning tasks, especially those with large amount of unlabeled data and high cost of labeling. Active learning aims to address this problem by asking queries in a smart way. Two main scenarios of querying considered in the literature are query synthesis and pool-based sampling. Since in many cases synthesized queries are meaningless or difficult for human to label, more efforts have been devoted to pool-based sampling in recent years. However, in pool-based active learning, querying requires evaluating every unlabeled data point in the pool, which is usually very time-consuming. By contrast, query synthesis has clear advantage on querying time, which is independent of the pool size. In this paper, we propose a novel framework combining query synthesis and pool-based sampling to accelerate the learning process and overcome the current limitation of query synthesis. The basic idea is to select the data point nearest to the synthesized query as the query point. We also provide two simple strategies for synthesizing informative queries. Moreover, to further speed up querying, we employ clustering techniques on the whole data set to construct a representative unlabeled data pool based on cluster centers. Experiments on several real-world data sets show that our methods have distinct advantages in time complexity and similar performance compared to pool-based uncertainty sampling methods.
Keywords :
computational complexity; learning (artificial intelligence); pattern clustering; query processing; cluster centers; clustering techniques; compact training data set; data point nearest selection; informative query synthesis; informative training data set; pool-based active learning; pool-based uncertainty sampling methods; representative point querying; representative unlabeled data pool; time complexity; Accuracy; Approximation algorithms; Educational institutions; Humans; Learning systems; Nearest neighbor searches; Uncertainty;
Conference_Titel :
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-1488-6
Electronic_ISBN :
2161-4393
DOI :
10.1109/IJCNN.2012.6252607