DocumentCode :
2370056
Title :
Regression clustering
Author :
Zhang, Bin
Author_Institution :
Hewlett-Packard Res. Labs., Palo Alto, CA, USA
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
451
Lastpage :
458
Abstract :
Complex distribution in real-world data is often modeled by a mixture of simpler distributions. Clustering is one of the tools to reveal the structure of this mixture. The same is true to the datasets with chosen response variables that people run regression on. Without separating the clusters with very different response properties, the residue error of the regression is large. Input variable selection could also be misguided to a higher complexity by the mixture. In regression clustering (RC), K (>1) regression functions are applied to the dataset simultaneously which guide the clustering of the dataset into K subsets each with a simpler distribution matching its guiding function. Each function is regressed on its own subset of data with a much smaller residue error. Both the regressions and the clustering optimize a common objective function. We present a RC algorithm based on K-harmonic means clustering algorithm and compare it with other existing RC algorithms based on K-means and EM.
Keywords :
data mining; error statistics; optimisation; pattern clustering; regression analysis; statistical analysis; statistical distributions; K-harmonic means clustering algorithm; complex distribution; guiding function; input variable selection; real-world data; regression clustering algorithm; regression functions; regression residue error; response variables; Clustering algorithms; Couplings; Data mining; Density functional theory; Input variables; Laboratories; Linear regression; Marketing and sales; Mean square error methods; Partitioning algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1250952
Filename :
1250952
Link To Document :
بازگشت