• DocumentCode
    3565632
  • Title

    Comparison of distance measures for clustering data with mix attribute types for Indonesian potential-based regional grouping

  • Author

    Prasetyo, Hermawan ; Purwarianti, Ayu

  • Author_Institution
    Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
  • fYear
    2014
  • Firstpage
    13
  • Lastpage
    18
  • Abstract
    Every region in Indonesia has different potentials and need to be analyzed for national development considerations. This analyzed can be accomplished with clustering Indonesian regional potential data, which is collected from PODES enumeration. This data consist of both numeric and categorical attributes. However, most of clustering algorithm can be applied on either numeric or categorical data. K-prototypes algorithm, as clustering algorithm which can deal with mix data types, has limitation such as distance measurement. Selecting distance measures properly is thus important to increase its performance. This paper presents a comparison of distance measures for clustering mix attribute type data. We have applied k-prototypes algorithm with several distance measures on PODES11-DESA dataset and used Silhouette index for clustering evaluation. The results show that the best clustering is accomplished by applying Ratio on Mismatches distance for categorical attributes. For numeric attributes, there is no one best performing distance measure since the performance of numeric distance measures varies for each treatment.
  • Keywords
    distance measurement; pattern clustering; regional planning; Indonesian potential-based regional grouping; Indonesian regional potential data; PODES enumeration; PODES11-DESA dataset; categorical attribute; clustering algorithm; clustering evaluation; distance measurement; k-prototypes algorithm; mix attribute type data clustering; mix data type; national development consideration; numeric attribute; numeric distance measure; silhouette index; Algorithm design and analysis; Chebyshev approximation; Clustering algorithms; Educational institutions; Indexes; Prototypes; Sociology; clustering mix attribute types; distance measures; k-prototypes algorithm; regional potentials;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Systems and Innovation (ICITSI), 2014 International Conference on
  • Type

    conf

  • DOI
    10.1109/ICITSI.2014.7048230
  • Filename
    7048230