• DocumentCode
    1798313
  • Title

    A new distance metric for unsupervised learning of categorical data

  • Author

    Hong Jia ; Yiu-ming Cheung

  • Author_Institution
    Dept. of Comput. Sci., Hong Kong Baptist Univ., Hong Kong, China
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    1893
  • Lastpage
    1899
  • Abstract
    Distance metric is the basis of many learning algorithms and its effectiveness usually has significant influence on the learning results. Generally, measuring distance for numerical data is a tractable task, but for categorical data sets, it could be a nontrivial problem. This paper therefore presents a new distance metric for categorical data based on the characteristics of categorical values. Specifically, the distance between two values from one attribute measured by this metric is determined by both of the frequency probabilities of these two values and the values of other attributes which have high interdependency with the calculated one. Promising experimental results on different real data sets have shown the effectiveness of proposed distance metric.
  • Keywords
    data analysis; unsupervised learning; categorical data; categorical values; distance metric; frequency probabilities; unsupervised learning; Frequency measurement; Hamming distance; Indexes; Joints; Probability; Redundancy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889890
  • Filename
    6889890