• DocumentCode
    6756
  • Title

    Discovering Characterizations of the Behavior of Anomalous Subpopulations

  • Author

    Angiulli, Fabrizio ; Fassetti, Fabio ; Palopoli, Luigi

  • Author_Institution
    DIMES Dept., Univ. of Calabria, Rende, Italy
  • Volume
    25
  • Issue
    6
  • fYear
    2013
  • fDate
    Jun-13
  • Firstpage
    1280
  • Lastpage
    1292
  • Abstract
    We consider the problem of discovering attributes, or properties, accounting for the a priori stated abnormality of a group of anomalous individuals (the outliers) with respect to an overall given population (the inliers). To this aim, we introduce the notion of exceptional property and define the concept of exceptionality score, which measures the significance of a property. In particular, in order to single out exceptional properties, we resort to a form of minimum distance estimation for evaluating the badness of fit of the values assumed by the outliers compared to the probability distribution associated with the values assumed by the inliers. Suitable exceptionality scores are introduced for both numeric and categorical attributes. These scores are, both from the analytical and the empirical point of view, designed to be effective for small samples, as it is the case for outliers. We present an algorithm, called EXPREX, for efficiently discovering exceptional properties. The algorithm is able to reduce the needed computational effort by not exploring many irrelevant numerical intervals and by exploiting suitable pruning rules. The experimental results confirm that our technique is able to provide knowledge characterizing outliers in a natural manner.
  • Keywords
    data mining; statistical distributions; EXPREX; anomalous subpopulations; categorical attributes; exceptional properties; exceptionality score; inliers; knowledge characterizing outliers; minimum distance estimation; numeric attributes; probability distribution; pruning rules; Algorithm design and analysis; Approximation methods; Distribution functions; Equations; Genetics; Mathematical model; Knowledge discovery; anomaly characterization; mixed-attribute data; unbalanced data;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.58
  • Filename
    6171187