DocumentCode :
2028912
Title :
Clustering GML documents using maximal frequent induced subtrees
Author :
Zhu, Ying-wen ; Ji, Gen-lin ; Sun, Qin-hong
Author_Institution :
Dept. of Comput. Found. Teaching, Sanjiang Univ., Nanjing, China
Volume :
5
fYear :
2010
fDate :
10-12 Aug. 2010
Firstpage :
2265
Lastpage :
2269
Abstract :
An algorithm, TBCClustering, is presented in the paper for clustering GML documents using maximal frequent induced subtree patterns. TBCClustering mines the maximal frequent induced subtrees by using the structural information of GML documents, it can get the best minimum support automatically, and then chooses a set of subtree patterns to form the optimistic clustering features. Finally it uses CLOPE algorithm to cluster the GML documents by clustering features without giving the number of clusters. Experiment results have shown that TBCClustering is more effective and efficient than PBClustering.
Keywords :
data mining; document handling; pattern clustering; trees (mathematics); CLOPE algorithm; PBClustering; TBCClustering; clustering GML document; maximal frequent induced subtree; optimistic clustering feature; structural information; Algorithm design and analysis; Clustering algorithms; Computers; Data mining; Databases; Encoding; XML; Clustering; GML document mining; Induced subtree; Maximal frequent subtree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location :
Yantai, Shandong
Print_ISBN :
978-1-4244-5931-5
Type :
conf
DOI :
10.1109/FSKD.2010.5569321
Filename :
5569321
Link To Document :
بازگشت