Title :
Classification using Dirichlet priors when the training data are mislabeled
Author :
Lynch, Robert S., Jr. ; Willett, Peter K.
Author_Institution :
Naval Undersea Warfare Centre, Newport, RI, USA
Abstract :
The average probability of error is used to demonstrate the performance of a Bayesian classification test (referred to as the combined Bayes test (CBT)) given the training data of each class are mislabeled. The CBT combines the information in discrete training and test data to intersymbol probabilities, where a uniform Dirichlet prior (i.e., a noninformative prior of complete ignorance) is assumed for all classes. Using this prior it is shown how the classification performance degrades when mislabeling exists in the training data, and this occurs with a severity that depends on the value of the mislabeling probabilities. However, an increase in the mislabeling probabilities are also shown to cause an increase in M* (i.e., the best quantization fineness). Further, even when the actual mislabeling probabilities are known by the CBT, it is not possible to achieve the classification performance obtainable without mislabeling
Keywords :
Bayes methods; error statistics; quantisation (signal); signal classification; Bayesian classification test; average error probability; best quantization fineness; classification performance; combined Bayes test; discrete test data; discrete training data; intersymbol probabilities; mislabeled training data; mislabeling probabilities; noninformative prior; training data; uniform Dirichlet priors; Bayesian methods; Contracts; Degradation; Labeling; Laboratories; Pattern recognition; Quantization; Random variables; Testing; Training data;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
Conference_Location :
Phoenix, AZ
Print_ISBN :
0-7803-5041-3
DOI :
10.1109/ICASSP.1999.761387