Author_Institution : 
DEIS, Univ. of Calabria, Rende, Italy
         
        
            Abstract : 
This work introduces the Prototype-based Domain Description rule (PDD) one-class classifier. PDD is a nearest neighbor-based classifier since it accepts objects on the basis of their nearest neighbor distances in a reference set of objects, also called prototypes. For a suitable choice of the prototype set, the PDD classifier is equivalent to another nearest neighbor-based one-class classifier, namely, the NNDD classifier. Moreover, it generalizes statistical tests for outlier detection. The concept of a PDD consistent subset is introduced, which exploits only a selected subset of the training set. It is shown that computing a minimum size PDD consistent subset is, in general, not approximable within any constant factor. A logarithmic approximation factor algorithm, called the CPDD algorithm, for computing a minimum size PDD consistent subset is then introduced. In order to efficiently manage very large data sets, a variant of the basic rule, called Fast CPDD, is also presented. Experimental results show that the CPDD rule sensibly improves over the CNNDD classifier, namely the condensed variant of NNDD, in terms of size of the subset while guaranteeing a comparable classification quality, that it is competitive over other one-class classification methods and is suitable to classify large data sets.
         
        
            Keywords : 
learning (artificial intelligence); pattern classification; CNNDD classifier; CPDD algorithm; NNDD classifier; PDD classifier; logarithmic approximation factor algorithm; nearest neighbor-based classifier; one-class classification methods; outlier detection; prototype-based domain description rule; statistical tests; Approximation algorithms; Approximation methods; Classification algorithms; Handheld computers; Measurement; Prototypes; Training; One-class classification; data set condensation.; nearest neighbor classification; novelty detection; Algorithms; Cluster Analysis; Databases, Factual; Discriminant Analysis; Pattern Recognition, Automated; Reproducibility of Results;