• DocumentCode
    1798135
  • Title

    A hybrid coupled k-nearest neighbor algorithm on imbalance data

  • Author

    Chunming Liu ; Longbing Cao ; Yu, Philip S.

  • Author_Institution
    Adv. Analytics Inst., Univ. of Sydney Technol., Sydney, NSW, Australia
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    2011
  • Lastpage
    2018
  • Abstract
    The state-of-the-art classification algorithms rarely consider the relationship between the attributes in the data sets and assume the attributes are independently to each other (IID). However, in real-world data, these attributes are more or less interacted via explicit or implicit relationships. Although the classifiers for class-balanced data are relatively well developed, the classification of class-imbalanced data is not straightforward, especially for mixed type data which has both categorical and numerical features. Limited research has been conducted on the class-imbalanced data. Some algorithms mainly synthesize or remove instances to force the sizes of each class comparable, which may change the inherent data structure or introduces noise to the source data. While for the distance or similarity based algorithms, they ignored the relationship between features when computing the similarity. This paper proposes a hybrid coupled k-nearest neighbor classification algorithm (HC-kNN) for mixed type data, by doing discretization on numerical features to adapt the inter coupling similarity as we do on categorical features, then combing this coupled similarity to the original similarity or distance, to overcome the shortcoming of the previous algorithms. The experiment results demonstrate that our proposed algorithm can get a higher average performance than that of the relevant algorithms (e.g. the variants of kNN, Decision Tree, SMOTE and NaiveBayes).
  • Keywords
    pattern classification; HC-kNN; IID; categorical features; class-balanced data; class-imbalanced data; classifier; data structure; distance based algorithm; explicit relationship; hybrid coupled k-nearest neighbor classification algorithm; imbalance data; implicit relationship; mixed type data; numerical features; similarity based algorithm; Algorithm design and analysis; Classification algorithms; Clouds; Couplings; Size measurement; Training; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889798
  • Filename
    6889798