DocumentCode :
2851292
Title :
Estimation of false negatives in classification
Author :
Mane, Sandeep ; Srivastava, Jaideep ; Hwang, San-Yih ; Vayghan, Jamshid
Author_Institution :
Dept. of Comput. Sci., Minnesota Univ., Minneapolis, MN, USA
fYear :
2004
fDate :
1-4 Nov. 2004
Firstpage :
475
Lastpage :
478
Abstract :
In many classification problems such as spam detection and network intrusion, a large number of unlabeled test instances are predicted negative by the classifier However, the high costs as well as time constraints on an expert\´s time prevent further analysis of the "predicted false" class instances in order to segregate the false negatives from the true negatives. A systematic method is thus required to obtain an estimate of the number of false negatives. A capture-recapture based method can be used to obtain an ML-estimate of false negatives when two or more independent classifiers are available. In the case for which independence does not hold, we can apply log-linear models to obtain an estimate of false negatives. However, as shown in this paper, lesser the dependencies among the classifiers, better is the estimate obtained for false negatives. Thus, ideally independent classifiers should be used to estimate the false negatives in an unlabeled dataset. Experimental results on the spam dataset from the UCI machine learning repository are presented.
Keywords :
classification; security of data; unsolicited e-mail; capture-recapture based method; classification problem; false negative estimation; log-linear model; network intrusion; spam detection; Computer science; Costs; Humans; Information management; Intelligent networks; Intrusion detection; Machine learning; Marketing and sales; Testing; Time factors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
Print_ISBN :
0-7695-2142-8
Type :
conf
DOI :
10.1109/ICDM.2004.10048
Filename :
1410339
Link To Document :
بازگشت