• DocumentCode
    2335880
  • Title

    A fast algorithm to cluster high dimensional basket data

  • Author

    Ordonez, Carlos ; Omiecinski, Edward ; Ezquerra, Norberto

  • Author_Institution
    Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    633
  • Lastpage
    636
  • Abstract
    Clustering is a data mining problem that has received significant attention by the database community. Data set size, dimensionality and sparsity have been identified as aspects that make clustering more difficult. The article introduces a fast algorithm to cluster large binary data sets where data points have high dimensionality and most of their coordinates are zero. This is the case with basket data transactions containing items, that can be represented as sparse binary vectors with very high dimensionality. An experimental section shows performance, advantages and limitations of the proposed approach
  • Keywords
    data mining; pattern clustering; very large databases; basket data transactions; data mining problem; data points; data set dimensionality; data set size; database community; fast algorithm; high dimensional basket data clustering; large binary data set clustering; sparse binary vectors; Association rules; Clustering algorithms; Data mining; Databases; Educational institutions; Maximum likelihood estimation; Multidimensional systems; Partitioning algorithms; Sparse matrices; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-1119-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2001.989586
  • Filename
    989586