مرکز منطقه ای اطلاع رساني علوم و فناوري - Active Evaluation of Classifiers on Large Datasets

DocumentCode :

2985146

Title :

Active Evaluation of Classifiers on Large Datasets

Author :

Katariya, N. ; Iyer, Amrit ; Sarawagi, S.

Author_Institution :

IIT Bombay, Mumbai, India

fYear :

2012

fDate :

10-13 Dec. 2012

Firstpage :

329

Lastpage :

338

Abstract :

The goal of this work is to estimate the accuracy of a classifier on a large unlabeled dataset based on a small labeled set and a human labeler. We seek to estimate accuracy and select instances for labeling in a loop via a continuously refined stratified sampling strategy. For stratifying data we develop a novel strategy of learning r bit hash functions to preserve similarity in accuracy values. We show that our algorithm provides better accuracy estimates than existing methods for learning distance preserving hash functions. Experiments on a wide spectrum of real datasets show that our estimates achieve between 15% and 62% relative reduction in error compared to existing approaches. We show how to perform stratified sampling on unlabeled data that is so large that in an interactive setting even a single sequential scan is impractical. We present an optimal algorithm for performing importance sampling on a static index over the data that achieves close to exact estimates while reading three orders of magnitude less data.

Keywords :

cryptography; importance sampling; learning (artificial intelligence); pattern classification; sampling methods; active classifier evaluation; continuously refined stratified sampling strategy; distance preserving hash function; importance sampling; labeling accuracy; labeling instance; learning strategy; unlabeled dataset classifier; Accuracy; Estimation; Humans; Labeling; Learning systems; Reliability; Vectors; Accuracy estimation; active evaluation; learning hash functions;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining (ICDM), 2012 IEEE 12th International Conference on

Conference_Location :

Brussels

ISSN :

1550-4786

Print_ISBN :

978-1-4673-4649-8

Type :

conf

DOI :

10.1109/ICDM.2012.161

Filename :

6413890

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2985146