DocumentCode
2985146
Title
Active Evaluation of Classifiers on Large Datasets
Author
Katariya, N. ; Iyer, Amrit ; Sarawagi, S.
Author_Institution
IIT Bombay, Mumbai, India
fYear
2012
fDate
10-13 Dec. 2012
Firstpage
329
Lastpage
338
Abstract
The goal of this work is to estimate the accuracy of a classifier on a large unlabeled dataset based on a small labeled set and a human labeler. We seek to estimate accuracy and select instances for labeling in a loop via a continuously refined stratified sampling strategy. For stratifying data we develop a novel strategy of learning r bit hash functions to preserve similarity in accuracy values. We show that our algorithm provides better accuracy estimates than existing methods for learning distance preserving hash functions. Experiments on a wide spectrum of real datasets show that our estimates achieve between 15% and 62% relative reduction in error compared to existing approaches. We show how to perform stratified sampling on unlabeled data that is so large that in an interactive setting even a single sequential scan is impractical. We present an optimal algorithm for performing importance sampling on a static index over the data that achieves close to exact estimates while reading three orders of magnitude less data.
Keywords
cryptography; importance sampling; learning (artificial intelligence); pattern classification; sampling methods; active classifier evaluation; continuously refined stratified sampling strategy; distance preserving hash function; importance sampling; labeling accuracy; labeling instance; learning strategy; unlabeled dataset classifier; Accuracy; Estimation; Humans; Labeling; Learning systems; Reliability; Vectors; Accuracy estimation; active evaluation; learning hash functions;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location
Brussels
ISSN
1550-4786
Print_ISBN
978-1-4673-4649-8
Type
conf
DOI
10.1109/ICDM.2012.161
Filename
6413890
Link To Document