DocumentCode :
2985146
Title :
Active Evaluation of Classifiers on Large Datasets
Author :
Katariya, N. ; Iyer, Amrit ; Sarawagi, S.
Author_Institution :
IIT Bombay, Mumbai, India
fYear :
2012
fDate :
10-13 Dec. 2012
Firstpage :
329
Lastpage :
338
Abstract :
The goal of this work is to estimate the accuracy of a classifier on a large unlabeled dataset based on a small labeled set and a human labeler. We seek to estimate accuracy and select instances for labeling in a loop via a continuously refined stratified sampling strategy. For stratifying data we develop a novel strategy of learning r bit hash functions to preserve similarity in accuracy values. We show that our algorithm provides better accuracy estimates than existing methods for learning distance preserving hash functions. Experiments on a wide spectrum of real datasets show that our estimates achieve between 15% and 62% relative reduction in error compared to existing approaches. We show how to perform stratified sampling on unlabeled data that is so large that in an interactive setting even a single sequential scan is impractical. We present an optimal algorithm for performing importance sampling on a static index over the data that achieves close to exact estimates while reading three orders of magnitude less data.
Keywords :
cryptography; importance sampling; learning (artificial intelligence); pattern classification; sampling methods; active classifier evaluation; continuously refined stratified sampling strategy; distance preserving hash function; importance sampling; labeling accuracy; labeling instance; learning strategy; unlabeled dataset classifier; Accuracy; Estimation; Humans; Labeling; Learning systems; Reliability; Vectors; Accuracy estimation; active evaluation; learning hash functions;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
ISSN :
1550-4786
Print_ISBN :
978-1-4673-4649-8
Type :
conf
DOI :
10.1109/ICDM.2012.161
Filename :
6413890
Link To Document :
بازگشت