DocumentCode
3269851
Title
An outlier-aware data clustering algorithm in mixture models
Author
Thang, Nguyen Duc ; Chen Lihui ; Keong, Chan Chee
Author_Institution
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear
2009
fDate
8-10 Dec. 2009
Firstpage
1
Lastpage
5
Abstract
A robust mixture model-based clustering algorithm using genetic techniques is proposed in this paper. In many engineering and application domains, noisy samples and outliers often exist in data collections, causing negative effects on performance of data mining methods if they are not made aware of these elements. Classical probabilistic mixture-based clustering is one known to be very sensitive to such situation. To improve its performance, we combine Genetic Algorithm (GA) with the expectation-maximization (EM) procedure of the classical model. When trimmed likelihood is used as fitness function of GA, high representative samples are selected and potential outliers are pruned off effectively during the learning process. Experiments on both synthetic and real data for different applications show that our approach outperforms the classical mixture model, by producing more accurate and reliable results.
Keywords
data analysis; data mining; expectation-maximisation algorithm; genetic algorithms; pattern clustering; probability; data collection; data mining; expectation-maximization procedure; genetic algorithm; outlier-aware data clustering algorithm; potential outliers; probabilistic mixture based clustering; robust mixture model; Clustering algorithms; Data analysis; Data engineering; Data mining; Distortion measurement; Genetic algorithms; Genetic engineering; Maximum likelihood estimation; Parameter estimation; Robustness; Robust clustering; genetic algorithm; mixture model; outliers;
fLanguage
English
Publisher
ieee
Conference_Titel
Information, Communications and Signal Processing, 2009. ICICS 2009. 7th International Conference on
Conference_Location
Macau
Print_ISBN
978-1-4244-4656-8
Electronic_ISBN
978-1-4244-4657-5
Type
conf
DOI
10.1109/ICICS.2009.5397571
Filename
5397571
Link To Document