• DocumentCode
    3124307
  • Title

    Flexible Fault Tolerant Subspace Clustering for Data with Missing Values

  • Author

    Günnemann, Stephan ; Müller, Emmanuel ; Raubach, Sebastian ; Seidl, Thomas

  • Author_Institution
    RWTH Aachen Univ., Aachen, Germany
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    231
  • Lastpage
    240
  • Abstract
    In today´s applications, data analysis tasks are hindered by many attributes per object as well as by faulty data with missing values. Subspace clustering tackles the challenge of many attributes by cluster detection in any subspace projection of the data. However, it poses novel challenges for handling missing values of objects, which are part of multiple subspace clusters in different projections of the data. In this work, we propose a general fault tolerance definition enhancing subspace clustering models to handle missing values. We introduce a flexible notion of fault tolerance that adapts to the individual characteristics of subspace clusters and ensures a robust parameterization. Allowing missing values in our model increases the computational complexity of subspace clustering. Thus, we prove novel monotonicity properties for an efficient computation of fault tolerant subspace clusters. Experiments on real and synthetic data show that our fault tolerance model yields high quality results even in the presence of many missing values. For repeatability, we provide all datasets and executables on our website.
  • Keywords
    Web sites; computational complexity; data analysis; fault tolerant computing; pattern clustering; Website; cluster detection; computational complexity; data analysis tasks; fault tolerant subspace data clustering; missing values; monotonicity properties; robust parameterization; Adaptation models; Approximation methods; Computational modeling; Data mining; Databases; Fault tolerance; Fault tolerant systems; fault tolerance; incomplete data; missing values; subspace clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.70
  • Filename
    6137227