Title :
Condensed Nearest Neighbor Data Domain Description
Author :
Angiulli, Fabrizio
Author_Institution :
Univ. della Calabria, Rende
Abstract :
A simple yet effective unsupervised classification rule to discriminate between normal and abnormal data is based on accepting test objects whose nearest neighbors´ distances in a reference data set, assumed to model normal behavior, lie within a certain threshold. This work investigates the effect of using a subset of the original data set as the reference set of the classifier. With this aim, the concept of a reference-consistent subset is introduced and it is shown that finding the minimum-cardinality reference-consistent subset is intractable. Then, the condensed nearest neighbor domain description (CNNDD) algorithm is described, which computes a reference-consistent subset with only two reference set passes. Experimental results revealed the advantages of condensing the data set and confirmed the effectiveness of the proposed approach. A thorough comparison with related methods was accomplished, pointing out the strengths and weaknesses of one-class nearest-neighbor-based training-set-consistent condensation.
Keywords :
learning (artificial intelligence); pattern classification; condensed nearest neighbor data domain description; minimum-cardinality reference-consistent subset; reference data set; reference-consistent subset; unsupervised classification; Delay; Nearest neighbor searches; Noise robustness; Object detection; Testing; Training data; classification; data condensation; data domain description; nearest neighbor rule; novelty detection; Algorithms; Artificial Intelligence; Cluster Analysis; Discriminant Analysis; Information Storage and Retrieval; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2007.1086