مرکز منطقه ای اطلاع رساني علوم و فناوري - Querying representative points from a pool based on synthesized queries

DocumentCode :

2773723

Title :

Querying representative points from a pool based on synthesized queries

Author :

Hu, Xuelei ; Wang, Liantao ; Yuan, Bo

Author_Institution :

Sch. of Comput. Sci. & Technol., Nanjing Univ. of Sci. & Technol., Nanjing, China

fYear :

2012

fDate :

10-15 June 2012

Firstpage :

Lastpage :

Abstract :

How to build a compact and informative training data set autonomously is crucial for many real-world learning tasks, especially those with large amount of unlabeled data and high cost of labeling. Active learning aims to address this problem by asking queries in a smart way. Two main scenarios of querying considered in the literature are query synthesis and pool-based sampling. Since in many cases synthesized queries are meaningless or difficult for human to label, more efforts have been devoted to pool-based sampling in recent years. However, in pool-based active learning, querying requires evaluating every unlabeled data point in the pool, which is usually very time-consuming. By contrast, query synthesis has clear advantage on querying time, which is independent of the pool size. In this paper, we propose a novel framework combining query synthesis and pool-based sampling to accelerate the learning process and overcome the current limitation of query synthesis. The basic idea is to select the data point nearest to the synthesized query as the query point. We also provide two simple strategies for synthesizing informative queries. Moreover, to further speed up querying, we employ clustering techniques on the whole data set to construct a representative unlabeled data pool based on cluster centers. Experiments on several real-world data sets show that our methods have distinct advantages in time complexity and similar performance compared to pool-based uncertainty sampling methods.

Keywords :

computational complexity; learning (artificial intelligence); pattern clustering; query processing; cluster centers; clustering techniques; compact training data set; data point nearest selection; informative query synthesis; informative training data set; pool-based active learning; pool-based uncertainty sampling methods; representative point querying; representative unlabeled data pool; time complexity; Accuracy; Approximation algorithms; Educational institutions; Humans; Learning systems; Nearest neighbor searches; Uncertainty;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Neural Networks (IJCNN), The 2012 International Joint Conference on

Conference_Location :

Brisbane, QLD

ISSN :

2161-4393

Print_ISBN :

978-1-4673-1488-6

Electronic_ISBN :

2161-4393

Type :

conf

DOI :

10.1109/IJCNN.2012.6252607

Filename :

6252607

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2773723