• DocumentCode
    1798411
  • Title

    A survey of distance/similarity measures for categorical data

  • Author

    Alamuri, Madhavi ; Surampudi, Bapi Raju ; Negi, Atul

  • Author_Institution
    Sch. of Comput. & Inf. Sci., Univ. of Hyderabad, Hyderabad, India
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    1907
  • Lastpage
    1914
  • Abstract
    Similarity or distance between two objects plays a fundamental role in many data mining tasks like classification and clustering. Categorical data, unlike numeric data, conceptually is deficient of default ordering relations on the attribute values. This makes the task of devising similarity or distance metrics and data mining tasks such as classification and clustering of categorical data more challenging. In this paper we formulate a taxonomy of various distance or similarity measures used in conjunction with data whose attributes are categorical. We categorize the existing measures into two broad classes, namely, Context-free and Context-sensitive measures for categorical data. In addition, we suggest a taxonomy of the clustering approaches for categorical data. We also propose a hybrid approach for measuring similarity between objects. We make a relative comparison of the strengths and weaknesses of some of the similarity measures and point out future research directions.
  • Keywords
    data mining; pattern classification; pattern clustering; categorical data classification; categorical data clustering; context-free measures; context-sensitive measures; data mining tasks; distance measures; similarity measures; Classification algorithms; Clustering algorithms; Context; Educational institutions; Entropy; Measurement; Partitioning algorithms; Categorical data; Clustering; Similarity; Supervised; Unsupervised;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889941
  • Filename
    6889941