Title :
Clustering GML documents using maximal frequent induced subtrees
Author :
Zhu, Ying-wen ; Ji, Gen-lin ; Sun, Qin-hong
Author_Institution :
Dept. of Comput. Found. Teaching, Sanjiang Univ., Nanjing, China
Abstract :
An algorithm, TBCClustering, is presented in the paper for clustering GML documents using maximal frequent induced subtree patterns. TBCClustering mines the maximal frequent induced subtrees by using the structural information of GML documents, it can get the best minimum support automatically, and then chooses a set of subtree patterns to form the optimistic clustering features. Finally it uses CLOPE algorithm to cluster the GML documents by clustering features without giving the number of clusters. Experiment results have shown that TBCClustering is more effective and efficient than PBClustering.
Keywords :
data mining; document handling; pattern clustering; trees (mathematics); CLOPE algorithm; PBClustering; TBCClustering; clustering GML document; maximal frequent induced subtree; optimistic clustering feature; structural information; Algorithm design and analysis; Clustering algorithms; Computers; Data mining; Databases; Encoding; XML; Clustering; GML document mining; Induced subtree; Maximal frequent subtree;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location :
Yantai, Shandong
Print_ISBN :
978-1-4244-5931-5
DOI :
10.1109/FSKD.2010.5569321