Title of article
Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning
Author/Authors
Nilsson، Martin نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2002
Pages
-310
From page
311
To page
0
Abstract
We present a non-greedy version of the recently published Principal Direction Divisive Partitioning (PDDP) algorithm. The PDDP algorithm creates a hierarchical taxonomy of a data set by successively splitting the data into sub-clusters. At each level the cluster with largest variance is split by a hyper-plane orthogonal to its leading principal component. The PDDP algorithm is known to produce high quality clusters, especially when applied to high dimensional data, such as document-word feature matrices. It also scales well with both the size and the dimensionality of the data set. However, at each level only the locally optimal choice of spitting is considered. At a later stage this often leads to a non-optimal global partitioning of the data. The nongreedy version of the PDDP algorithm (NGPDDP) presented in this paper address this problem. At each level multiple alternative splitting strategies are considered. Results from applying the algorithm to generated and real data (feature vectors from sets of text documents) are presented. The results show substantial improvements in the cluster quality.
Keywords
Clustering , taxonomy , classification , PCA
Journal title
INFORMATION RETRIEVAL
Serial Year
2002
Journal title
INFORMATION RETRIEVAL
Record number
89802
Link To Document