DocumentCode :
3124307
Title :
Flexible Fault Tolerant Subspace Clustering for Data with Missing Values
Author :
Günnemann, Stephan ; Müller, Emmanuel ; Raubach, Sebastian ; Seidl, Thomas
Author_Institution :
RWTH Aachen Univ., Aachen, Germany
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
231
Lastpage :
240
Abstract :
In today´s applications, data analysis tasks are hindered by many attributes per object as well as by faulty data with missing values. Subspace clustering tackles the challenge of many attributes by cluster detection in any subspace projection of the data. However, it poses novel challenges for handling missing values of objects, which are part of multiple subspace clusters in different projections of the data. In this work, we propose a general fault tolerance definition enhancing subspace clustering models to handle missing values. We introduce a flexible notion of fault tolerance that adapts to the individual characteristics of subspace clusters and ensures a robust parameterization. Allowing missing values in our model increases the computational complexity of subspace clustering. Thus, we prove novel monotonicity properties for an efficient computation of fault tolerant subspace clusters. Experiments on real and synthetic data show that our fault tolerance model yields high quality results even in the presence of many missing values. For repeatability, we provide all datasets and executables on our website.
Keywords :
Web sites; computational complexity; data analysis; fault tolerant computing; pattern clustering; Website; cluster detection; computational complexity; data analysis tasks; fault tolerant subspace data clustering; missing values; monotonicity properties; robust parameterization; Adaptation models; Approximation methods; Computational modeling; Data mining; Databases; Fault tolerance; Fault tolerant systems; fault tolerance; incomplete data; missing values; subspace clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
ISSN :
1550-4786
Print_ISBN :
978-1-4577-2075-8
Type :
conf
DOI :
10.1109/ICDM.2011.70
Filename :
6137227
Link To Document :
بازگشت