Title :
MaPle: a fast algorithm for maximal pattern-based clustering
Author :
Pei, Jian ; Zhang, Xiaoling ; Cho, Moonjung ; Wang, Haixun ; Yu, Philip S.
Author_Institution :
State Univ. of New York, Buffalo, NY, USA
Abstract :
Pattern-based clustering is important in many applications, such as DNA micro-array data analysis, automatic recommendation systems and target marketing systems. However, pattern-based clustering in large databases is challenging. On the one hand, there can be a huge number of clusters and many of them can be redundant and thus make the pattern-based clustering ineffective. On the other hand, the previous proposed methods may not be efficient or scalable in mining large databases. We study the problem of maximal pattern-based clustering. Redundant clusters are avoided completely by mining only the maximal pattern-based clusters. MaPle, an efficient and scalable mining algorithm is developed. It conducts a depth-first, divide-and-conquer search and prunes unnecessary branches smartly. Our extensive performance study on both synthetic data sets and real data sets shows that maximal pattern-based clustering is effective. It reduces the number of clusters substantially. Moreover, MaPle is more efficient and scalable than the previously proposed pattern-based clustering methods in mining large databases.
Keywords :
data mining; divide and conquer methods; optimisation; pattern clustering; search problems; statistical analysis; very large databases; DNA micro-array data analysis; MaPle mining algorithm; automatic recommendation system; divide-and-conquer search; large database mining; maximal pattern-based clustering; real data set; redundant cluster; synthetic data set; target marketing system; Clustering algorithms; Clustering methods; Concrete; DNA; Data analysis; Data mining; Databases; Gene expression; Motion pictures;
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
DOI :
10.1109/ICDM.2003.1250928