• DocumentCode
    1459448
  • Title

    Integrating Clustering and Supervised Learning for Categorical Data Analysis

  • Author

    Maulik, Ujjwal ; Bandyopadhyay, Sanghamitra ; Saha, Indrajit

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Jadavpur Univ., Kolkata, India
  • Volume
    40
  • Issue
    4
  • fYear
    2010
  • fDate
    7/1/2010 12:00:00 AM
  • Firstpage
    664
  • Lastpage
    675
  • Abstract
    The problem of fuzzy clustering of categorical data, where no natural ordering among the elements of a categorical attribute domain can be found, is an important problem in exploratory data analysis. As a result, a few clustering algorithms with focus on categorical data have been proposed. In this paper, a modified differential evolution (DE)-based fuzzy c-medoids (FCMdd) clustering of categorical data has been proposed. The algorithm combines both local as well as global information with adaptive weighting. The performance of the proposed method has been compared with those using genetic algorithm, simulated annealing, and the classical DE technique, besides the FCMdd, fuzzy k-modes, and average linkage hierarchical clustering algorithm for four artificial and four real life categorical data sets. Statistical test has been carried out to establish the statistical significance of the proposed method. To improve the result further, the clustering method is integrated with a support vector machine (SVM), a well-known technique for supervised learning. A fraction of the data points selected from different clusters based on their proximity to the respective medoids is used for training the SVM. The clustering assignments of the remaining points are thereafter determined using the trained classifier. The superiority of the integrated clustering and supervised learning approach has been demonstrated.
  • Keywords
    data analysis; learning (artificial intelligence); pattern clustering; statistical testing; support vector machines; categorical data analysis; classical DE technique; differential evolution based fuzzy c-medoids clustering; fuzzy clustering; genetic algorithm; simulated annealing; statistical testing; supervised learning; support vector machine; Categorical data; differential evolution (DE); fuzzy clustering; genetic algorithm; simulated annealing (SA); support vector machine (SVM);
  • fLanguage
    English
  • Journal_Title
    Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1083-4427
  • Type

    jour

  • DOI
    10.1109/TSMCA.2010.2041225
  • Filename
    5440907