• DocumentCode
    397064
  • Title

    A general framework for clustering high-dimensional datasets

  • Author

    Yanchang, Zhao ; Junde, Song

  • Author_Institution
    Beijing Univ. of Posts & Telecommun., China
  • Volume
    2
  • fYear
    2003
  • fDate
    4-7 May 2003
  • Firstpage
    1091
  • Abstract
    In many fields, the datasets used in data mining applications are usually of high dimensionality. Most existing algorithms of clustering are effective and efficient when the dimensionality is low, but their performance and effectiveness degrade when the data space is high-dimensional. One reason is that their complexity increases exponentially with the dimensionality. To solve the problem, we put forward a general framework for clustering high-dimensional datasets. Common clustering algorithms, when combined with our framework, can be applied to cluster high-dimensional datasets efficiently. In our framework, a high-dimensional clustering is broken into several one- or two-dimensional clustering phases. During each phase, only one or two dimensions are involved. In such a way, common algorithms for clustering low-dimensional datasets can be used to process high-dimensional ones. In addition, attributes of different types can be processed with different algorithms in separate phases and datasets of hybrid data types can be handled easily. The efficiency and effectiveness of our framework is proven in our experiments.
  • Keywords
    data mining; pattern clustering; clustering high-dimensional datasets; data mining applications; hybrid data types; two-dimensional clustering phases; Algorithm design and analysis; Clustering algorithms; Data mining; Degradation; Performance analysis; Telecommunications;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on
  • ISSN
    0840-7789
  • Print_ISBN
    0-7803-7781-8
  • Type

    conf

  • DOI
    10.1109/CCECE.2003.1226086
  • Filename
    1226086