• DocumentCode
    679541
  • Title

    Cartification: A Neighborhood Preserving Transformation for Mining High Dimensional Data

  • Author

    Aksehirli, Emin ; Goethals, Bart ; Muller, E. ; Vreeken, Jilles

  • Author_Institution
    Univ. of Antwerp, Antwerp, Belgium
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    937
  • Lastpage
    942
  • Abstract
    The analysis of high dimensional data comes with many intrinsic challenges. In particular, cluster structures become increasingly hard to detect when the data includes dimensions irrelevant to the individual clusters. With increasing dimensionality, distances between pairs of objects become very similar, and hence, meaningless for knowledge discovery. In this paper we propose Cartification, a new transformation to circumvent this problem. We transform each object into an item set, which represents the neighborhood of the object. We do this for multiple views on the data, resulting in multiple neighborhoods per object. This transformation enables us to preserve the essential pair wise-similarities of objects over multiple views, and hence, to improve knowledge discovery in high dimensional data. Our experiments show that frequent item set mining on the certified data outperforms competing clustering approaches on the original data space, including traditional clustering, random projections, principle component analysis, subspace clustering, and clustering ensemble.
  • Keywords
    data mining; pattern clustering; cartification; clustering ensemble; high dimensional data mining; knowledge discovery; neighborhood preserving transformation; principle component analysis; random projections; subspace clustering; Algorithm design and analysis; Clustering algorithms; Data mining; Itemsets; Noise; Noise measurement; clustering; frequent itemset mining; high dimensional data; subspace projections; transformation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2013.146
  • Filename
    6729578