• DocumentCode
    3269851
  • Title

    An outlier-aware data clustering algorithm in mixture models

  • Author

    Thang, Nguyen Duc ; Chen Lihui ; Keong, Chan Chee

  • Author_Institution
    Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2009
  • fDate
    8-10 Dec. 2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    A robust mixture model-based clustering algorithm using genetic techniques is proposed in this paper. In many engineering and application domains, noisy samples and outliers often exist in data collections, causing negative effects on performance of data mining methods if they are not made aware of these elements. Classical probabilistic mixture-based clustering is one known to be very sensitive to such situation. To improve its performance, we combine Genetic Algorithm (GA) with the expectation-maximization (EM) procedure of the classical model. When trimmed likelihood is used as fitness function of GA, high representative samples are selected and potential outliers are pruned off effectively during the learning process. Experiments on both synthetic and real data for different applications show that our approach outperforms the classical mixture model, by producing more accurate and reliable results.
  • Keywords
    data analysis; data mining; expectation-maximisation algorithm; genetic algorithms; pattern clustering; probability; data collection; data mining; expectation-maximization procedure; genetic algorithm; outlier-aware data clustering algorithm; potential outliers; probabilistic mixture based clustering; robust mixture model; Clustering algorithms; Data analysis; Data engineering; Data mining; Distortion measurement; Genetic algorithms; Genetic engineering; Maximum likelihood estimation; Parameter estimation; Robustness; Robust clustering; genetic algorithm; mixture model; outliers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information, Communications and Signal Processing, 2009. ICICS 2009. 7th International Conference on
  • Conference_Location
    Macau
  • Print_ISBN
    978-1-4244-4656-8
  • Electronic_ISBN
    978-1-4244-4657-5
  • Type

    conf

  • DOI
    10.1109/ICICS.2009.5397571
  • Filename
    5397571