• DocumentCode
    256372
  • Title

    DPM: Fast and scalable clustering algorithm for large scale high dimensional datasets

  • Author

    Ghanem, T.F. ; Elkilani, W.S. ; Ahmed, H.S. ; Hadhoud, M.M.

  • Author_Institution
    Inf. Technol. Dept., Menofiya Univ., Shebin-El-Kom, Egypt
  • fYear
    2014
  • fDate
    22-23 Dec. 2014
  • Firstpage
    71
  • Lastpage
    79
  • Abstract
    Clustering multi-dense large scale high dimensional datasets is a challenging task duo to high time complexity of most clustering algorithms. Nowadays, data collection tools produce a large amount of data. So, fast algorithms are vital requirement for clustering such data. In this paper, a fast clustering algorithm, called Dimension-based Partitioning and Merging (DPM), is proposed. In DPM, First, data is partitioned into small dense volumes during the successive processing of dataset dimensions. Then, noise is filtered out using dimensional densities of the generated partitions. Finally, merging process is invoked to construct clusters based on partition boundary data samples. DPM algorithm automatically detects the number of data clusters based on three insensitive tuning parameters which decrease the burden of its usage. Performance evaluation of the proposed algorithm using different datasets shows its fastness and accuracy compared to other clustering competitors.
  • Keywords
    computational complexity; data acquisition; data mining; DPM; data collection tool; density-based clustering; dimension-based-partitioning-and-merging; insensitive tuning parameter; large scale high dimensional datasets; scalable clustering algorithm; time complexity; TV; Clustering; density-based clustering; subspace clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Engineering & Systems (ICCES), 2014 9th International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4799-6593-9
  • Type

    conf

  • DOI
    10.1109/ICCES.2014.7030932
  • Filename
    7030932