DocumentCode :
2131197
Title :
Estimating True and False Positive Rates in Higher Dimensional Problems and Its Data Mining Applications
Author :
Foss, Andrew ; Zaiane, Osmar R.
Author_Institution :
Dept. of Comput. Sci., Univ. of Alberta, Edmonton, AB
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
673
Lastpage :
681
Abstract :
If we can estimate the accuracy of our observations then we can estimate the true and false positive rates over a series of samples in high dimensional data mining problems. To date such issues have been largely neglected and previously no algorithm has been provided to facilitate the computations involved. In high dimensional data mining tasks, increasing sparsity leads to decreasing true positive rates. Estimating this effect allows the estimation of the true size of membership of a class or cluster allowing us to identify the top candidates for these false negatives, while tracking the likelihood of false positives. These estimates of true and false positive rates can also help researchers avoid unnecessary costs by collecting only the number of samples that are really needed. We propose an algorithm for these computations designated the statistical error rate algorithm (SERA) and give an example of its use.
Keywords :
data mining; statistical analysis; data mining; false negatives; higher dimensional problems; positive rates; statistical error rate algorithm; Algorithm design and analysis; Clustering algorithms; Conferences; Costs; Data mining; Diseases; Error analysis; Testing; Data mining; False positive and negative estimation; High-dimensionality; Microarray;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.38
Filename :
4733993
Link To Document :
بازگشت