Title :
Razor: mining distance-constrained embedded subtrees
Author :
Tan, Henry ; Dillon, Tharam S. ; Hadzic, Fedja ; Chang, Elizabeth
Author_Institution :
Fac. of Inf. Technol., Univ. of Technol., Sydney, NSW
Abstract :
Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labeled subtrees. Previously we have developed an efficient algorithm, MB3 (Tan et al., 2005), for mining frequent embedded subtrees from a database of rooted labeled and ordered subtrees. The efficiency comes from the utilization of a novel embedding list representation for tree model guided (TMG) candidate generation. As an extension the IMB3 (Tan et al., 2006) algorithm introduces the level of embedding constraint. In this study we extend our past work by developing an algorithm, Razor, for mining embedded subtrees where the distance of nodes relative to the root of the subtree needs to be considered. This notion of distance constrained embedded tree mining will have important applications in Web information systems, conceptual model analysis and more sophisticated ontology matching. Domains representing their knowledge in a tree structured form may require this additional distance information as it commonly indicates the amount of specific knowledge stored about a particular concept within the hierarchy. The structure based approaches for schema matching commonly take the distance among the concept nodes within a sub-structure into account when evaluating the concept similarity across different schemas. We present an encoding strategy to efficiently enumerate candidate subtrees taking the distance of nodes relative to the root of the subtree into account. The algorithm is applied to both synthetic and real-world datasets, and the experimental results demonstrate the correctness and effectiveness of the proposed technique
Keywords :
data mining; ontologies (artificial intelligence); tree data structures; Razor; Web information systems; association mining; conceptual model analysis; distance constrained embedded tree mining; distance information; distance-constrained embedded subtrees; embedding list representation; frequent subtree mining; ontology matching; rooted ordered labeled subtrees; schema matching; structure matching; tree model guided candidate generation; tree structured form; Australia; Data mining; Databases; Electronic mail; Encoding; Information analysis; Information systems; Information technology; Ontologies; Tree graphs;
Conference_Titel :
Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2702-7
DOI :
10.1109/ICDMW.2006.138