Title :
A general approach for automating outliers identification in categorical data
Author :
Taha, Ahmed ; Hadi, A.S.
Author_Institution :
Fac. of Comput. & Inf., Cairo Univ., Giza, Egypt
Abstract :
Outliers identification algorithms for categorical datasets strongly depend on parameter settings that require prior information about the data, e.g. number of outliers in the data, maximum length of itemsets and/or minimum support for frequent itemsets. These input parameters are classified into two groups; (a) intrinsic parameters which are required by an outliers detection method to produce a score measure to each object and (b) decision parameters which are required for deciding on whether an object is an outlier based on the score. In this paper, a general approach for automating decision parameters of outliers identification in multivariate categorical data is proposed. The added value of the proposed approach is that it can be used by any outliers detection algorithm for categorical data that produces a score measure for each object. We provide a simulation approach for computing critical values for any outliers detection algorithm. These critical values are distribution-free statistical measures. They are also based on data-driven characteristics, hence they can be used for the identification of outliers based on the score measure produced by the algorithm. We illustrate this approach using two outliers detection algorithms. Furthermore, real and synthetic datasets are used to evaluate the performance of the proposed approach.
Keywords :
object detection; statistical analysis; categorical datasets; distribution-free statistical measures; intrinsic parameters; multivariate categorical data; outliers detection method; outliers identification automation; Algorithm design and analysis; Computational modeling; Detection algorithms; Itemsets; Pollution measurement; Testing; Vectors;
Conference_Titel :
Computer Systems and Applications (AICCSA), 2013 ACS International Conference on
Conference_Location :
Ifrane
DOI :
10.1109/AICCSA.2013.6616425