Title :
Efficiently mining frequent trees in a forest: algorithms and applications
Author :
Zaki, Mohammed J.
Author_Institution :
Dept. of Comput. Sci., Rensselaer Polytech. Inst., Troy, NY, USA
Abstract :
Mining frequent trees is very useful in domains like bioinformatics, Web mining, mining semistructured data, etc. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER), and we also compare it with TREEMINERD, which counts only distinct occurrences of a pattern. We conduct detailed experiments to test the performance and scalability of these methods. We also use tree mining to analyze RNA structure and phylogenetics data sets from bioinformatics domain.
Keywords :
data mining; macromolecules; pattern matching; tree data structures; PATTERNMATCHER; RNA structure; TREEMINER; bioinformatics domain; data mining; data structure; frequent tree mining; labeled trees; pattern matching; phylogenetics data sets; scope-list; Bioinformatics; Data mining; Databases; Pattern matching; Phylogeny; RNA; Testing; Tree data structures; Tree graphs; Web mining; Index Terms- Frequent tree mining; RNA structure; data mining.; labeled trees; ordered; pattern matching; phylogenetic trees; rooted; subtree enumeration;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2005.125