DocumentCode :
2627477
Title :
Using upper bounds on attainable discrimination to select discrete valued features
Author :
Lovell, D.R. ; Dance, C.R. ; Niranjan, M. ; Prager, R.W. ; Dalton, K.J.
Author_Institution :
Dept. of Eng., Cambridge Univ., UK
fYear :
1996
fDate :
4-6 Sep 1996
Firstpage :
233
Lastpage :
242
Abstract :
Selection of features that will permit accurate pattern classification is, in general, a difficult task. However, if a particular data set is represented by discrete valued features, it becomes possible to determine empirically the contribution that each feature makes to the discrimination between classes. We describe how to calculate the maximum discrimination possible in a two alternative forced choice decision problem, when discrete valued features are used to represent a given data set. (In this paper, we measure discrimination in terms of the area under the receiver operating characteristic (ROC) curve.) Since this bound corresponds to the upper limit of classification achievable by any classifier (with that given data representation), we can use it to assess whether recognition errors are due to a lack of separability in the data or shortcomings in the classification technique. In comparison to the training and testing of artificial neural networks, the empirical bound on discrimination can be efficiently calculated, allowing an experimenter to decide whether subsequent development of neural network models is warranted. We extend the discrimination bound method so that we can estimate both the maximum and average discrimination we can expect on unseen test data. These estimation techniques are the basis of a backwards elimination algorithm that can be used to rank features in order of their discriminative power. We use two problems to demonstrate this feature selection process: classification of the Mushroom Database, and a real-world, pregnancy related medical risk prediction task-assessment of risk of perinatal death
Keywords :
decision theory; optimisation; pattern classification; probability; Mushroom Database; attainable discrimination; average discrimination; backwards elimination algorithm; discrete valued features; discrimination bound method; maximum discrimination; pattern classification; perinatal death; pregnancy related medical risk prediction task; receiver operating characteristic; recognition errors; two alternative forced choice decision problem; upper bounds; Area measurement; Artificial neural networks; Character recognition; Force measurement; Gynaecology; Hospitals; Pattern classification; Pregnancy; Spatial databases; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks for Signal Processing [1996] VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop
Conference_Location :
Kyoto
ISSN :
1089-3555
Print_ISBN :
0-7803-3550-3
Type :
conf
DOI :
10.1109/NNSP.1996.548353
Filename :
548353
Link To Document :
بازگشت